Season 1 · Ch. 3

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

January 27, 2026 · 4 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B