step 0
loss 4.200
lr 3.0e-4
tok/s 28,535
Building LLMs From Scratch
Season 1: GPT-2 134M · Season 2: Llama 1B · Season 3: SFT & Garbage · Season 4: Llama 2B · Season 5: Llama 3B
$861 total · One GPU · Zero shortcuts

By Jun Park - Dangerously curious life scientist. Currently unsupervised with an A100. Read more →

A life scientist who got curious about transformers - the neural network kind, not the protein kind - and decided to build one from scratch.

GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B
S3 1B SFT
S4 Llama 2B
S5 Llama 3B