Our Take
Law firms are betting that encoding their client relationships and internal processes into custom models beats generic frontier AI, but the exceptionalism claim is overcooked: most legal work product isn't proprietary, and what is special is almost always the relationship, not the method.
Why it matters
After years of assuming general models would eliminate the need for specialized training, the legal AI market is reversing course on custom fine-tuning. Data security, better performance on niche workflows, and agentic systems that orchestrate multiple tools are driving the shift—and it forces firms to decide whether their competitive edge is real or perceived.
Do this week
General Counsel: audit which recurring client work streams your firm claims are unique, then ask a peer at a rival firm if they could replicate the output with different lawyers, so you know what's actually defensible to invest in custom model training.
Harvey Confirms Open Source Model Training with Law Firms
Harvey CEO Winston Weinberg told Artificial Lawyer the company is running proof-of-concept studies with law firms to fine-tune open source large language models on firm workflows and client-specific practices. The goal is not just to encode how work happens internally, but to encode the entire experience from law firm through to the client, so that automation can be tailored to recurring needs.
This mirrors moves elsewhere in legal tech. Kirkland & Ellis, after announcing a $500 million AI investment with Palantir, has begun hiring AI infrastructure experts with GPU cluster experience, suggesting plans for in-house open source training. Thomson Reuters has also been training open source LLMs on its legal research corpus to augment its commercial AI tools.
Harvey co-founder Gabe Pereyra outlined the ambition on X: a legal foundation model series designed to serve frontier-quality intelligence at lower cost with stronger security, and to let law firms "own their own intelligence." The models will target complex client matters spanning months and involving dozens of associates, orchestrating legal tech tools, sub-agents, and escalations to frontier models or humans. Harvey has open-sourced benchmarks representing associate and in-house lawyer work and reports "promising results" when post-training open source models on legal tasks.
The Secret Sauce Bet Is Weaker Than It Sounds
The marketing hook—that law firms can "bottle" their proprietary methods and lock in competitive advantage—rests on a shaky premise. Most legal work product has no durable moat. A contract drafted by a Manhattan elite firm may look different from a High Street rival's version, but the differences are legible to competitors. Documents circulate. Methods leak. Tax structures that are genuinely unique to one partner at one firm are the exception, not the rule.
What firms may legitimately own is narrower: the relationship itself. How one firm's lawyers interact with a specific client, the client's playbooks, the orchestration of past work—those may differ from what another firm would do. But even that is limited by human factors, not secret methodology. As one expert put it to Artificial Lawyer, the difference between how Ford and Tesla build cars is real but not planetary. Law firms operate in an even smaller margin.
The real driver of this shift is not exceptionalism but pragmatism. Data stays on-premises or in controlled environments. Firms believe they can extract better performance from fine-tuned models on their own patterns than from generic frontiers alone. And agentic workflows—systems that combine reference data, client playbooks, process orchestration, and a specialized LLM—do create tighter automation for recurring work. That's defensible. The mythology of irreplicable expertise is not.
What Firms Should Do Now
Stop assuming your workflows are unique unless you can defend that claim to a peer at a competitor. Audit your recurring client work for three things: (1) data sensitivity that justifies on-premises training, (2) process complexity that generic models handle poorly, and (3) client relationships where the experience difference (not the output) is defensible. Only invest custom model training in those buckets. For the rest, use frontier models with better prompting and retrieval. The cost and operational burden of maintaining custom fine-tuned models is real, and it is not worth incurring to defend a claim you cannot actually prove.