Tags
- 1B 1
- A100 1
- alignment 1
- AMP 2
- ARC 1
- architecture 2
- benchmarks 1
- BPE 1
- checkpoints 1
- Chinchilla 1
- cloud-gpu 1
- Colab 1
- comparison 1
- cost-analysis 3
- cost-optimization 1
- data 1
- data-quality 4
- debugging 2
- deep-dive 1
- differential-learning-rate 1
- dpo 2
- evaluation 2
- fine-tuning 3
- flash-attention 2
- garbage-tokens 3
- GPT-2 1
- GPU 1
- GPUburnout-1B 10
- gpuburnout-2b 4
- gpuburnout-3b 3
- HellaSwag 1
- inference 2
- infrastructure 2
- intro 1
- lessons 2
- loss-curve 1
- loss-curves 2
- MMLU 1
- moosefs 1
- motivation 1
- optimization 2
- performance 1
- phase-1 1
- preprocessing 1
- pretraining 3
- progressive-growth 1
- results 1
- runpod 1
- scaling 3
- scaling-laws 1
- season-1 6
- season-2 5
- season-3 3
- season-4 4
- season-5 3
- sft 3
- thunder-compute 1
- tokenization 1
- torch-compile 2
- training 6
- transformer 2
- vectorization 1
- vram 1