Why I Decided to Build a Language Model from Scratch
Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.
Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.
Where I learned that 90% of ML is just cleaning data and crying about file sizes.
How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’
A comprehensive guide to every way I shot myself in the foot training GPT-2 Small. Learn from my pain.
Final loss curves, the damage to my compute budget, and 22 lessons I paid dearly to learn.