step 0
loss 4.200
lr 3.0e-4
tok/s 28,535
Building LLMs From Scratch
Season 1: GPT-2 134M · Season 2: Llama 1B
$175 total · One GPU · Zero shortcuts

By Jun Park — Dangerously curious life scientist. Currently unsupervised with an A100. Read more →

A life scientist who got curious about transformers — the neural network kind, not the protein kind — and decided to build one from scratch.

GPUburnout
GPUburnout
Will Code for Tokens
S1 GPT-2 134M
S2 Llama 1B