GPT-2

Season 1 · Ch. 3

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

GPUburnout

Will Code for Tokens

S1 GPT-2 134M

S2 Llama 1B