Back to news
AnalysisJune 29, 2026· 2 min read

Law firms build their own AI instead of renting from OpenAI

Thomson Reuters, Kirkland & Ellis, and Harvey are training custom models on proprietary legal data to escape dependence on US-based AI vendors. Why control over AI infrastructure is becoming a competitive advantage.

Our Take

This is not geopolitics dressed up as technology; it's rational cost control and competitive differentiation wearing a sovereignty label.

Why it matters

Large law firms and legal tech vendors are realizing that relying entirely on third-party models exposes them to price increases, availability risk, and commoditization of their expertise. As AI becomes table stakes, owning the training pipeline becomes a margin play.

Do this week

General Counsel: Audit which AI workflows are locked into single vendors by January 31 so you can identify the top 3 candidates for internal retraining on proprietary datasets.

Law firms are training their own models

Thomson Reuters has begun training open source LLMs on its own legal data rather than routing all queries through OpenAI or Anthropic. Kirkland & Ellis is building GPU clusters for model training in partnership with Palantir and has made a deliberate choice to avoid commercial platforms. Harvey is working with law firms to train custom models on their proprietary workflows and client data.

These moves reflect a broader pattern Artificial Lawyer calls "AI sovereignty," though the term covers more than geopolitical independence from US AI vendors. It includes any effort by organizations to control their own AI infrastructure and avoid lock-in to third-party providers.

Control over infrastructure protects margins and independence

The stakes are concrete. First, token costs are rising. A firm that fine-tunes or post-trains on its own curated legal datasets reduces per-query costs compared to querying GPT-4 or Claude at commercial rates. Second, regulatory risk is real. Anthropic's Fable model was temporarily banned; firms dependent on a single provider have no fallback. Third, data leverage matters. Thomson Reuters holds decades of legal precedent and filings; using that as a training base creates a defensible moat rather than feeding it into a commodity model accessible to competitors.

For Kirkland & Ellis, the narrative control is explicit. The firm wants to avoid the perception that its advice is simply ChatGPT output wrapped in billable hours. Building internal infrastructure is marketing as much as engineering.

Audit token spend and data ownership now

If your organization spends more than $50K annually on API calls to OpenAI, Anthropic, or Google, the math favors exploring open source model training on your own data. Mistral, Llama, and other open models can be fine-tuned and run on commodity hardware or cloud infrastructure (AWS, Azure).

Start by cataloging: which workflows consume the most tokens, which workflows operate on proprietary or sensitive data, and which outputs require domain-specific accuracy (medical diagnostics, legal research, financial risk modeling). These are candidates for in-house retraining. The others stay on commercial APIs.

The constraint is not technology; it is operational maturity. Training and deploying models requires in-house ML ops, data governance, and version control. If you lack that capability, start with a vendor partner (Harvey, or specialist boutiques) to pilot one workflow before building a team.

#Legal AI#Open Source#Enterprise AI#Fine-tuning
Share:
Keep reading

Related stories