Garbage-Tokens

Season 3 · Ch. 3

Nine Experiments, Nine Funerals

I had a diagnosis. Garbage tokens, pretraining contamination, baked into the base weights, unreachable by fine-tuning. Open and shut. Case closed. Except science doesn’t accept “trust me bro” as evidence. The only way to prove the diagnosis was to try fixing it the wrong way and watch it not work. Repeatedly. With increasing desperation. Nine experiments. Zero fixes. One scoreboard. Here we go. SFT: Five Attempts, Five Failures I built a cleaning pipeline, removed 27% of SlimOrca (139K examples), verified zero garbage tokens in the cleaned set, and ran five experiments: ...

Season 3 · Ch. 2

My Model's Vocabulary Came from Stack Overflow at 3am

My chat model had a haunted vocabulary. PersonX. AndroidRuntime. fefefe. oardvark. Paasilinna. The same seven nonsense tokens, in different prompts, at different temperatures, across totally separate runs. Not random. Specific. Reproducible. A slot machine that only ever pays out in cursed symbols. I needed to find where they came from. Standard CSI episode: dust the model for prints, follow the trail back, identify the perpetrator. My first suspect: the fine-tuning data. SlimOrca is GPT-4 generated, and machine text sometimes carries annotation crud from academic NLP datasets. Plausible. Easy to test. Confidently wrong. ...

Season 3 · Ch. 1

Teaching the 1B to Talk

At the end of Season 2, I had a “working” 1B parameter language model. The scare quotes are doing some heavy lifting. Yes, it could complete sentences. Yes, it knew Paris was a city. Yes, it could write paragraphs about single-cell RNA sequencing with journal citations that looked real and were absolutely not. Ask it the capital of France and it would confidently answer “the currency in the money is dollar and the currency is dollar and the currency is the euro and euro.” Technically not wrong about the euro. Wildly wrong about everything else. As base models go, it was functional. As useful tools go, it was a paperweight that costs electricity. ...