Season-1

Season 1 · Ch. 6

Training Optimizations Deep Dive: How I Made the A100 Actually Work

The complete technical reference for achieving 16x speedup. Every optimization explained with code and diagrams.

Season 1 · Ch. 5

Final loss curves, the damage to my compute budget, and 22 lessons I paid dearly to learn.

Season 1 · Ch. 4

A comprehensive guide to every way I shot myself in the foot training GPT-2 Small. Learn from my pain.

Season 1 · Ch. 3

How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’

Season 1 · Ch. 2

How I built a 12GB ChatGPT-style conversational dataset and implemented BPE tokenization for efficient training.

Season 1 · Ch. 1

Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.