Season 5 · Ch. 4

It Took Me Two Weeks to Read My Own Code

I drafted this chapter on May 14, the morning after the 3B benchmarks landed. The opening was a banger about “the inflection point at 3B parameters.” The conclusion explained how alignment tax shrinks with scale and flips positive between 2B and 3B. There was a chart. The chart had a smooth curve. I was very pleased with the chart. For two weeks I was going to publish that chapter. I told three people. I rehearsed a tweet thread. I picked which checkpoint to put in the OG image. ...

May 27, 2026 · 12 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B
S3 1B SFT
S4 Llama 2B
S5 Llama 3B