Tags
- 1B 1
- A100 1
- AMP 2
- ARC 1
- architecture 2
- benchmarks 1
- BPE 1
- Chinchilla 1
- cloud-gpu 1
- Colab 1
- cost-analysis 2
- cost-optimization 1
- data 1
- debugging 1
- deep-dive 1
- evaluation 2
- flash-attention 2
- GPT-2 1
- GPU 1
- GPUburnout-1B 5
- HellaSwag 1
- inference 1
- infrastructure 2
- intro 1
- lessons 2
- loss-curves 2
- MMLU 1
- motivation 1
- optimization 2
- performance 1
- phase-1 1
- preprocessing 1
- results 1
- scaling 3
- scaling-laws 1
- season-1 6
- season-2 5
- tokenization 1
- torch-compile 2
- training 3
- transformer 2
- vectorization 1