2026  21

April  3

Nothing Happened for 75,000 Steps and It Was Glorious

April 19, 2026 · 6 min · Jun Park

My Code Agent Said It Was a Moose. I Said No. It Was a Moose.

April 12, 2026 · 10 min · Jun Park

I Have an A100. I Have 528 Shards of Data. I Cannot Combine Them.

April 7, 2026 · 8 min · Jun Park

March  11

Verbatim: The Proof Is in the Output

March 30, 2026 · 11 min · Jun Park

7 Out of 8 - How DPO Finally Worked

March 29, 2026 · 5 min · Jun Park

1.92B Parameters, 38.4B Tokens, Zero Garbage

March 28, 2026 · 4 min · Jun Park

RIP GPUburnout-1B. Cause of Death: Its Own Training Data.

March 22, 2026 · 3 min · Jun Park

Nine Experiments, Nine Funerals

March 21, 2026 · 4 min · Jun Park

My Model’s Vocabulary Came from Stack Overflow at 3am

March 18, 2026 · 4 min · Jun Park

Teaching the 1B to Talk

March 18, 2026 · 4 min · Jun Park

I Spent Another $68 Because a Spreadsheet Wouldn’t Stop Staring at Me

March 15, 2026 · 9 min · Jun Park

10 Things I Learned Training a 1B Parameter Model That Nobody Talks About

March 7, 2026 · 14 min · Jun Park

What GPUburnout-1B Actually Learned

March 6, 2026 · 10 min · Jun Park

The $175 Experiment: Training GPUburnout-1B on a Single GPU

March 4, 2026 · 10 min · Jun Park

February  4

From 134M to 1B: Building GPUburnout-1B From Scratch

February 27, 2026 · 7 min · Jun Park

Training Optimizations Deep Dive: How I Made the A100 Actually Work

February 12, 2026 · 21 min · Jun Park

The Results Are In (And My Wallet Is Empty)

February 6, 2026 · 6 min · Jun Park

11 Training Challenges and How I Solved Them

February 2, 2026 · 6 min · Jun Park

January  3

Scaling Up: From Tiny Model to GPT-2 Small

January 27, 2026 · 4 min · Jun Park

Data Preparation: Building a 12GB Training Corpus

January 22, 2026 · 4 min · Jun Park

Why I Decided to Build a Language Model from Scratch

January 15, 2026 · 3 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B
S3 1B SFT
S4 Llama 2B
S5 Llama 3B