Why I Decided to Build a Language Model from Scratch
Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.
Because apparently using someone else’s model was too easy. Here’s how I tortured myself by training GPT from scratch.
How I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’