Scaling Up: From Tiny Model to GPT-2 SmallHow I went from ‘cute toy model’ to ‘134 million parameters that need an A100 to breathe.’