Our Take
A clean demo of agentic work-product assembly, but no independent benchmark shows this saves time or catches errors better than a human tax counsel reading the same sources.
Why it matters
Legal AI vendors are moving from chat-based Q&A to multi-step agentic workflows that can orchestrate document uploads, cite sources, and generate client-facing deliverables in a single session. That's a shift in the interaction model, not yet a proof of speed or accuracy gains.
Do this week
Practitioners: test Harvey's Plan mode (which shows intended steps before execution) on a real transaction to measure whether pre-execution visibility reduces hallucinated steps or incorrect tax-relief findings versus the chat baseline.
Harvey Demonstrated Agentic Tax Analysis on a Complex Merger
David Murdter, product manager at Harvey and former patent litigator at Cooley, walked through the company's agent module applied to a transactional tax planning task. The workflow starts with an uploaded transaction step plan outlining how a merger is to be implemented. The agent then works through each step in sequence, identifying tax consequences, available reliefs, and whether regulatory clearance is required.
The agent grounds its reasoning in an England and Wales legal knowledge source rather than relying solely on the model's internal knowledge. As it works, it displays reasoning traces and a progress tracker listing intended steps and ticking them off as completed. The resulting output is a 25-page memorandum that flags the three highest-risk steps and includes approximately 130 line-level citations, each resolving to an underlying source document.
The demo also covered Harvey's "Improve Prompt" function, which the company says significantly rewrites the user's instruction to reference selected files and sources before the query executes. The session was then converted into three work products: a formatted Word memorandum, an executive-summary PowerPoint, and an interactive HTML artifact visualizing how entities and money flows change at each step. The demo also touched on Plan mode (showing the agent's intended actions before execution), handling of privileged data, in-product document editing with version control, and session sharing for partner review.
The Real Signal Is the Work-Product Pipeline, Not Agentic Reasoning
This is a polished end-to-end workflow that bundles agent reasoning, citation management, and multi-format output generation into a single session. That's a different user experience from chat-based legal AI, which typically outputs prose in a text box.
What's missing is any claim about speed or accuracy relative to a human tax counsel performing the same work. Harvey has not published benchmarks showing that the agent identifies more reliefs, catches more risks, or reduces the time from step plan to final memorandum compared to existing workflows. The citation count (130 in this run) is impressive as a procedural feature but does not tell you whether those citations were necessary, correct, or whether a human would have cited fewer sources and reached the same advice. The vendor demo also does not compare the agent's risk flagging to what a human counsel would flag from the same sources.
The highest-value claim here is that the agent reduces manual document assembly and formatting work. If a tax lawyer would previously spend two hours cutting and pasting into Word and PowerPoint, and this workflow does that automatically, that's a real productivity gain. But it is a work-product generation story, not an agentic reasoning story.
Test Plan Mode on Transactions You Already Know Well
If you deploy Harvey on real merger tax work, the critical gate is Plan mode. Before the agent executes, it shows you the steps it intends to take. Compare that plan to what you would do yourself on the same transaction. Count how many steps it missed, how many it added unnecessarily, and how many it got wrong. That comparison will tell you whether the agent is a productivity tool (handles citations and formatting) or a reasoning tool (catches risks). Right now, the demo does not provide that data.