05
Bridgewater fine-tuned a cheap open model to beat frontier LLMs on financial-news judgment
verifiedDeveloperFinance
Thursday, July 2, 2026
Confidence
High · — primary research with full figures
Evidence
Bridgewater AIA Labs + Thinking Machines Lab joint research post with published methodology and benchmark numbers
Bridgewater's AIA Labs and Thinking Machines published a study showing a fine-tuned open-weight model outperforms the best frontier LLMs at judging which financial news is relevant — at roughly a fourteenth of the running cost.
- On replicating expert investor judgment about document relevance, naive prompts to frontier models averaged 47.2% accuracy — "a coin flip" — and even expert-crafted prompts only reached 77.2%, short of the ~80% investors say they need to trust a system in daily work , per Thinking Machines Lab.
- Starting from Alibaba's open-weight Qwen3-235B (44.8% out of the box), the team used an expert-labeled dataset and on-policy distillation to reach 84.7% average accuracy — up from the best frontier model's 78.2%, or 29.8% fewer mistakes , per Thinking Machines Lab.
- The fine-tuned model runs at a 13.8x reduction in inference cost per task versus the frontier models tested, and the team notes newer, pricier frontier releases added only a point or two , per Bridgewater AIA Labs.
Sources