The Bart Test - Part 2: Testing the Overthinking Hypothesis
After seeing OLMo 3 overthink Gen-Alpha slang (scores of 4-5/10), I wondered: can I tune this to reduce over-thinking? If the model is trying too hard, maybe I could adjust parameters or prompts to make it more natural.
Spoiler: Both directions made it worse.
