Alibaba ships 560K AI chips built for agents, roadmap through 2028

Alibaba announces third-generation agent-optimized chip with confirmed production scale

Alibaba's semiconductor subsidiary T-Head unveiled the Zhenwu M890, a processor designed specifically for AI agents rather than standard LLM inference. The chip delivers three times the performance of its predecessor, the Zhenwu 810E (per company reporting). More significant than the performance gain is the architectural intent: the M890 prioritizes memory bandwidth and inter-model communication over raw inference speed, reflecting workloads where systems retain extended context, coordinate multiple models in real time, and execute multi-step tasks with minimal human intervention.

T-Head has already shipped over 560,000 Zhenwu units to date, deployed across 400 external customers spanning automakers, financial services, and 20 other industries (company-reported). This is not a lab prototype but a mature production footprint, providing Alibaba with real-world deployment data before the M890 rollout.

The company paired the hardware announcement with a multi-year silicon roadmap: the V900 arriving in Q3 2027 (expected to deliver roughly threefold performance gain), followed by the J900 in Q3 2028. This mirrors Nvidia's tick-tock product cycle approach. Alibaba also released Qwen 3.7-Max, its flagship LLM optimized for advanced coding and long-running agent tasks, engineered to operate continuously for up to 35 hours without performance degradation (per company specification). The chip and model launched simultaneously, packaged for deployment in Alibaba Cloud's Bailian platform inside the Panjiu AL128 server (128 M890 accelerators per rack).

Agent workloads demand different silicon than inference-optimized chips

Standard GPU accelerators are optimized for throughput on stateless inference: process a batch of requests, return results, repeat. Agent systems operate under different constraints. An autonomous system may need to retain hours of conversation history, switch between specialized models (one for retrieval, one for reasoning, one for execution planning), and pause-resume across multiple steps. These demands are heavy on memory bandwidth (storing and retrieving long context) and inter-model communication (coordinating latency-sensitive hand-offs), not peak floating-point operations.

Alibaba's chip design around agents signals a structural bet: the company believes enterprise AI compute over the next three to five years will be defined by long-running, multi-step autonomous tasks, not batch inference optimization. If that bet is correct, practitioners who deploy agents on inference-optimized hardware (or standard Nvidia GPUs) will face either memory bottlenecks or communication latency that agent-specific silicon avoids.

The timing also matters. Huawei announced a similar multi-year roadmap for its Ascend line last year. Both companies have concluded that dependence on US silicon, even with loosened export restrictions, is a structural risk. The response is to treat semiconductor development as a capability-building exercise, not a procurement problem. Alibaba committed 380 billion yuan (roughly US$53 billion) to cloud and AI infrastructure over three years as of last year, the largest-ever investment commitment by the company to the sector. The M890 and its successors are downstream of that spending.

Assess your agent architecture against agent-specific hardware before standardizing on GPU inference clusters

If your organization is deploying long-running agent systems in production, measure actual memory bandwidth utilization and inter-model communication frequency during pilot deployments. Standard benchmarks (tokens per second, latency on single-model inference) do not capture the constraints that agent-specific chips address. Understanding your own workload profile before committing to multi-year accelerator purchases will clarify whether purpose-built silicon (Alibaba's, Huawei's, or future US-designed alternatives) offers material advantage over inference-optimized hardware for your use case.

Alibaba ships 560K AI chips built for agents, roadmap through 2028

Our Take

Why it matters

Do this week

Alibaba announces third-generation agent-optimized chip with confirmed production scale

Agent workloads demand different silicon than inference-optimized chips

Assess your agent architecture against agent-specific hardware before standardizing on GPU inference clusters

One daily brief. Every story gets a hype verdict.

Related stories

Fenergo hires Finastra CRO to lead global revenue expansion

UK banks have 18 months to map third-party risks under PS26/2

Quantifind Lands $200M to Scale AI-Native Financial Crime Detection