Our Take
Microsoft is shipping an agentic system with real enterprise controls, but the 30-40% cost advantage versus Claude rests on internal testing of 125 runs—not independent verification—and applies only when comparing Anthropic's Opus 4.8 on both sides.
Why it matters
Agentic systems demand runtime discipline: admins need spending caps, audit trails, and task-level cost visibility before rollout. Microsoft's controls are designed to prevent runaway bills; the question is whether they actually work at scale across a distributed workforce.
Do this week
Finance/Ops: Download the Microsoft cost estimation spreadsheet and model your three personas (light, medium, heavy tasks) against your expected user base and prompt volume before enabling Cowork in your tenant on June 16.
Copilot Cowork exits preview with enterprise billing and plugins
Microsoft announced general availability of Copilot Cowork today, moving it from a three-month Frontier preview into production worldwide. More than half of the Fortune 500 is already running it, along with named customers including Accenture, Avanade, Capital Group, Koch, and Zurich Insurance (company-reported).
The product executes multi-step, long-running tasks across Microsoft 365 applications and third-party tools. Three published use cases: an engineering team automated batch-job spreadsheet edits and dependency charting; another team collapsed weeks of manual file comparison into hours; a sales leader ranked at-risk pipeline opportunities in a morning instead of a week.
Pricing is usage-based, measured in Copilot Credits. Admins pay a flat Microsoft 365 Copilot User Subscription License (USL) per seat, then face variable charges per task based on model selection, context retrieval, tool calls, and runtime. Microsoft segments tasks into three buckets: light (limited sources, one or fewer outputs), medium (multiple sources, two or more outputs), and heavy (broad aggregation, many outputs). Billing launches today; customers in the preview get a grace period through July 1, 2026.
Microsoft's internal testing compared Copilot Cowork against Claude Cowork with a Microsoft 365 connector, both running Anthropic's Opus 4.8 model, across 125 test runs spanning light, medium, and heavy task patterns. Copilot Cowork averaged 30-40% lower cost per prompt (internal testing, June 2026). Both systems are now available; Microsoft's Cowork 1, a fine-tuned secure model, ships in the coming weeks and is marketed as substantially cheaper for everyday tasks.
Cost controls are table stakes, but the savings claim needs independent audit
The cost advantage over Claude exists only in Microsoft's own lab. Internal benchmarks against a specific competitor model, on a narrow test set, do not constitute proof of production efficiency. The disclaimer buried in the footnote matters: "actual costs and potential savings may vary depending on usage, configuration, time, and other factors."
What does matter: Microsoft is shipping spending caps, budget allocation by user and group, usage alerts, and per-task cost visibility (the last as a coming feature). Agentic systems that run unsupervised can accumulate large bills fast. The control surface—tenant-level off-by-default, admin gating, spending limits, credit requests—is the real product here. Whether it works depends on adoption: do teams actually request credits, or do they burn through budgets quietly?
Microsoft is also diversifying the model choice. Anthropic's Opus and Sonnet are available now; GPT-5.5 is in Frontier; Cowork 1 will roll out soon. A multi-model runtime means customers aren't locked into one vendor's pricing or capability. That flexibility matters more than the vendor-claimed 30-40% advantage.
Three immediate moves: budget, gate, and monitor
First, avoid surprise bills. Download Microsoft's pricing model spreadsheet and estimate your likely task volume across the three task types. Multiply by your user count by persona (light, medium, heavy usage patterns). Decide whether you want pay-as-you-go at $0.01 per Copilot Credit or a commitment discount via P3 payments. Set spending limits per user and per group before you enable the feature.
Second, gate access. Cowork is off by default; keep it that way until you have instrumentation in place. Enable it first for a pilot group of power users who understand agentic systems and cost discipline. Use group policies to set user-level spending caps and require credit requests for outliers.
Third, monitor usage and cost per task once the visibility feature ships. The company's own customers asked how to budget for Cowork; that anxiety is justified. Real-world task costs will diverge from Microsoft's lab estimates. Treat the first month as instrumentation: collect data, audit which teams are running what, and adjust caps and model selection based on ROI, not just spend.