Season 4 · Ch. 3

7 Out of 8 - How DPO Finally Worked

Season 3: four DPO configurations on the 1B. Best: 4/8 clean. Worst: 7/8 garbage. More training literally made the model dumber. I had receipts. Season 4: same technique, similar hyperparameters, on the 2B. Result: 7/8 clean. First try. No suffering necessary. Same method. Different foundation. Completely different outcome. That’s the entire moral of Season 4 in one A/B test. I could end the post here. I won’t, because the details are too good to skip. ...

March 29, 2026 · 5 min · Jun Park
Season 3 · Ch. 3

Nine Experiments, Nine Funerals

I had a diagnosis. Garbage tokens, pretraining contamination, baked into the base weights, unreachable by fine-tuning. Open and shut. Case closed. Except science doesn’t accept “trust me bro” as evidence. The only way to prove the diagnosis was to try fixing it the wrong way and watch it not work. Repeatedly. With increasing desperation. Nine experiments. Zero fixes. One scoreboard. Here we go. SFT: Five Attempts, Five Failures I built a cleaning pipeline, removed 27% of SlimOrca (139K examples), verified zero garbage tokens in the cleaned set, and ran five experiments: ...

March 21, 2026 · 4 min · Jun Park
Season 3 · Ch. 1

Teaching the 1B to Talk

At the end of Season 2, I had a “working” 1B parameter language model. The scare quotes are doing some heavy lifting. Yes, it could complete sentences. Yes, it knew Paris was a city. Yes, it could write paragraphs about single-cell RNA sequencing with journal citations that looked real and were absolutely not. Ask it the capital of France and it would confidently answer “the currency in the money is dollar and the currency is dollar and the currency is the euro and euro.” Technically not wrong about the euro. Wildly wrong about everything else. As base models go, it was functional. As useful tools go, it was a paperweight that costs electricity. ...

March 18, 2026 · 4 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B
S3 1B SFT
S4 Llama 2B
S5 Llama 3B