Law firms build their own AI instead of renting from OpenAI

Law firms are training their own models

Thomson Reuters has begun training open source LLMs on its own legal data rather than routing all queries through OpenAI or Anthropic. Kirkland & Ellis is building GPU clusters for model training in partnership with Palantir and has made a deliberate choice to avoid commercial platforms. Harvey is working with law firms to train custom models on their proprietary workflows and client data.

These moves reflect a broader pattern Artificial Lawyer calls "AI sovereignty," though the term covers more than geopolitical independence from US AI vendors. It includes any effort by organizations to control their own AI infrastructure and avoid lock-in to third-party providers.

Control over infrastructure protects margins and independence

The stakes are concrete. First, token costs are rising. A firm that fine-tunes or post-trains on its own curated legal datasets reduces per-query costs compared to querying GPT-4 or Claude at commercial rates. Second, regulatory risk is real. Anthropic's Fable model was temporarily banned; firms dependent on a single provider have no fallback. Third, data leverage matters. Thomson Reuters holds decades of legal precedent and filings; using that as a training base creates a defensible moat rather than feeding it into a commodity model accessible to competitors.

For Kirkland & Ellis, the narrative control is explicit. The firm wants to avoid the perception that its advice is simply ChatGPT output wrapped in billable hours. Building internal infrastructure is marketing as much as engineering.

Audit token spend and data ownership now

If your organization spends more than $50K annually on API calls to OpenAI, Anthropic, or Google, the math favors exploring open source model training on your own data. Mistral, Llama, and other open models can be fine-tuned and run on commodity hardware or cloud infrastructure (AWS, Azure).

Start by cataloging: which workflows consume the most tokens, which workflows operate on proprietary or sensitive data, and which outputs require domain-specific accuracy (medical diagnostics, legal research, financial risk modeling). These are candidates for in-house retraining. The others stay on commercial APIs.

The constraint is not technology; it is operational maturity. Training and deploying models requires in-house ML ops, data governance, and version control. If you lack that capability, start with a vendor partner (Harvey, or specialist boutiques) to pilot one workflow before building a team.

Law firms build their own AI instead of renting from OpenAI

Our Take

Why it matters

Do this week

Law firms are training their own models

Control over infrastructure protects margins and independence

Audit token spend and data ownership now

Related stories

Non-observable states cut Markovian bandit regret near-logarithmic

New method lets you interpret protein AI models without exploding feature counts

Darts Adds Four Foundation Models in One Interface