Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

January 27, 2026 · 4 min · GPUburnout
GPUburnout
GPUburnout
Will Code for Tokens
134M Params
2.8B Tokens
7x Speedup