Evaluation

Season 2 · Ch. 3

What GPUburnout-1B Actually Learned

Time to face the music Training a language model is the fun part. You watch the loss drop, you generate text samples that are slightly less incoherent than yesterday’s, you tell yourself “look, it almost knows what France is.” It’s addictive. It’s rewarding. It also tells you absolutely nothing about how good your model actually is. Benchmarking is where the universe hands you a report card you didn’t ask for. ...

Season 1 · Ch. 5

The Results Are In (And My Wallet Is Empty)

Final loss curves, the damage to my compute budget, and 22 lessons I paid dearly to learn.

GPUburnout

Will Code for Tokens

S1 GPT-2 134M

S2 Llama 1B