Transformer

Season 1 · Ch. 3

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

Season 1 · Ch. 1

Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.

GPUburnout

Will Code for Tokens

S1 GPT-2 134M

S2 Llama 1B