LLM Training From Scratch | GPT-2 Tutorial
Posts
Tags
Archives
Search
Tags
AMP
1
architecture
1
BPE
1
Colab
1
data
1
debugging
1
evaluation
1
GPT-2
1
infrastructure
1
intro
1
lessons
1
loss-curves
1
motivation
1
optimization
1
phase-1
1
preprocessing
1
results
1
scaling
2
tokenization
1
torch-compile
1
training
1
transformer
2
GPUburnout
Will Code for Tokens
134M
Params
2.8B
Tokens
7x
Speedup