Why I Decided to Build a Language Model from Scratch

Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.

January 15, 2026 · 3 min · GPUburnout

Data Preparation: Building a 12GB Training Corpus

Where I learned that 90% of ML is just cleaning data and crying about file sizes.

January 22, 2026 · 4 min · GPUburnout

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

January 27, 2026 · 4 min · GPUburnout

10 Training Challenges and How I Solved Them

A comprehensive guide to every way I shot myself in the foot training GPT-2 Small. Learn from my pain.

February 2, 2026 · 5 min · GPUburnout

The Results Are In (And My Wallet Is Empty)

Final loss curves, the damage to my compute budget, and 22 lessons I paid dearly to learn.

February 6, 2026 · 6 min · GPUburnout
GPUburnout
GPUburnout
Will Code for Tokens
134M Params
2.8B Tokens
7x Speedup