Season 5 · Ch. 2

My Code Agent Said It Was a Moose. I Said No. It Was a Moose.

The H200 was working. The 3B was training. After three days of fighting the cloud, the model was finally putting tokens through the GPU at 23,200 per second. I had a checkpoint at step 1,000. I had a checkpoint at step 1,200. I went to bed feeling, briefly, like a person. Six hours later the run was dead. The checkpoint at step 1,200 was corrupted. The next run got to step 25 and froze. The one after that got to step 17 and silently disappeared. ...

April 12, 2026 · 10 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B
S3 1B SFT
S4 Llama 2B
S5 Llama 3B