05

Bridgewater fine-tuned a cheap open model to beat frontier LLMs on financial-news judgment

verifiedDeveloperFinance

Thursday, July 2, 2026

Confidence

High · — primary research with full figures

Evidence

Bridgewater AIA Labs + Thinking Machines Lab joint research post with published methodology and benchmark numbers

Bridgewater's AIA Labs and Thinking Machines published a study showing a fine-tuned open-weight model outperforms the best frontier LLMs at judging which financial news is relevant — at roughly a fourteenth of the running cost.
  • On replicating expert investor judgment about document relevance, naive prompts to frontier models averaged 47.2% accuracy — "a coin flip" — and even expert-crafted prompts only reached 77.2%, short of the ~80% investors say they need to trust a system in daily work , per Thinking Machines Lab.
  • Starting from Alibaba's open-weight Qwen3-235B (44.8% out of the box), the team used an expert-labeled dataset and on-policy distillation to reach 84.7% average accuracy — up from the best frontier model's 78.2%, or 29.8% fewer mistakes , per Thinking Machines Lab.
  • The fine-tuned model runs at a 13.8x reduction in inference cost per task versus the frontier models tested, and the team notes newer, pricier frontier releases added only a point or two , per Bridgewater AIA Labs.

Sources

Apply this today

The hands-on layer the brief points to: a workflow you can run, a tool to test, and today’s 60-second video.