Season 1 · Ch. 6

Training Optimizations Deep Dive: How I Made the A100 Actually Work

The complete technical reference for achieving 16x speedup. Every optimization explained with code and diagrams.

February 12, 2026 · 21 min · Jun Park
Season 1 · Ch. 5

The Results Are In (And My Wallet Is Empty)

Final loss curves, the damage to my compute budget, and 22 lessons I paid dearly to learn.

February 6, 2026 · 6 min · Jun Park
Season 1 · Ch. 4

11 Training Challenges and How I Solved Them

A comprehensive guide to every way I shot myself in the foot training GPT-2 Small. Learn from my pain.

February 2, 2026 · 6 min · Jun Park
Season 1 · Ch. 3

Scaling Up: From Tiny Model to GPT-2 Small

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

January 27, 2026 · 4 min · Jun Park
Season 1 · Ch. 2

Data Preparation: Building a 12GB Training Corpus

How I built a 12GB ChatGPT-style conversational dataset and implemented BPE tokenization for efficient training.

January 22, 2026 · 4 min · Jun Park
Season 1 · Ch. 1

Why I Decided to Build a Language Model from Scratch

Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.

January 15, 2026 · 3 min · Jun Park
GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B