Sft | GPUburnout | Jun Park

Season 5 · Ch. 4

It Took Me Two Weeks to Read My Own Code

I drafted this chapter on May 14, the morning after the 3B benchmarks landed. The opening was a banger about “the inflection point at 3B parameters.” The conclusion explained how alignment tax shrinks with scale and flips positive between 2B and 3B. There was a chart. The chart had a smooth curve. I was very pleased with the chart. For two weeks I was going to publish that chapter. I told three people. I rehearsed a tweet thread. I picked which checkpoint to put in the OG image. ...

Season 4 · Ch. 3

7 Out of 8 - How DPO Finally Worked

Season 3: four DPO configurations on the 1B. Best: 4/8 clean. Worst: 7/8 garbage. More training literally made the model dumber. I had receipts. Season 4: same technique, similar hyperparameters, on the 2B. Result: 7/8 clean. First try. No suffering necessary. Same method. Different foundation. Completely different outcome. That’s the entire moral of Season 4 in one A/B test. I could end the post here. I won’t, because the details are too good to skip. ...

Season 3 · Ch. 3

Nine Experiments, Nine Funerals

I had a diagnosis. Garbage tokens, pretraining contamination, baked into the base weights, unreachable by fine-tuning. Open and shut. Case closed. Except science doesn’t accept “trust me bro” as evidence. The only way to prove the diagnosis was to try fixing it the wrong way and watch it not work. Repeatedly. With increasing desperation. Nine experiments. Zero fixes. One scoreboard. Here we go. SFT: Five Attempts, Five Failures I built a cleaning pipeline, removed 27% of SlimOrca (139K examples), verified zero garbage tokens in the cleaned set, and ran five experiments: ...

Season 3 · Ch. 1

Teaching the 1B to Talk

At the end of Season 2, I had a “working” 1B parameter language model. The scare quotes are doing some heavy lifting. Yes, it could complete sentences. Yes, it knew Paris was a city. Yes, it could write paragraphs about single-cell RNA sequencing with journal citations that looked real and were absolutely not. Ask it the capital of France and it would confidently answer “the currency in the money is dollar and the currency is dollar and the currency is the euro and euro.” Technically not wrong about the euro. Wildly wrong about everything else. As base models go, it was functional. As useful tools go, it was a paperweight that costs electricity. ...