Why I Decided to Build a Language Model from Scratch

Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.

January 15, 2026 · 3 min · GPUburnout

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

January 27, 2026 · 4 min · GPUburnout
GPUburnout
GPUburnout
Will Code for Tokens
134M Params
2.8B Tokens
7x Speedup