Back to news
NewsJune 26, 2026· 3 min read

OpenAI's $1.4T chip bet: Jalapeño targets 75% margin gap vs. Nvidia

OpenAI is building its own inference processor with Broadcom to cut infrastructure costs from $14B annually to sustainable levels. Tape-out hit in 9 months; first data center rollout begins late 2026.

Our Take

OpenAI is not inventing a new chip category; it is buying margin back by doing what Google, Amazon, and Meta did years ago, except faster and with LLMs writing part of the design.

Why it matters

Infrastructure cost is the bottleneck to OpenAI profitability and product speed. A working inference ASIC built in-house shortens the feedback loop between model iteration and deployment efficiency, which matters now because competitors already own 10% of global AI compute outside Nvidia.

Do this week

Infrastructure teams: audit your inference TCO against Broadcom's published Tomahawk specs by EOQ 2025, so you can plan for custom-silicon vendors entering your RFP cycle by late 2026.

OpenAI built an inference-only ASIC in 9 months

OpenAI is manufacturing its first custom processor, codenamed Jalapeño, designed specifically for large language model inference. Built in partnership with Broadcom for silicon engineering and TSMC for production, the chip integrates Broadcom's Tomahawk networking silicon to support large-scale distributed serving across data centers. Celestica will assemble boards and rack systems.

The company completed tape-out (final design lock before manufacturing) in nine months. Laboratory samples are already running production workloads, including an unreleased GPT-5.3-Codex-Spark model, at target frequency and power consumption. Deployment into OpenAI's data centers is scheduled to begin by late 2026 (Broadcom CEO Hock Tan confirmed rollout will scale with infrastructure partners including Microsoft).

Richard Ho, head of OpenAI's hardware program, stated the architecture minimizes data movement to push realized utilization closer to theoretical peak performance. Unlike general-purpose accelerators retrofitted for AI, Jalapeño balances compute, memory, and networking resources to solve bottlenecks specific to interactive LLM serving.

The math forces vertical integration

OpenAI's operating margin is 33 cents per dollar of revenue after infrastructure and operational expenses. By contrast, Nvidia commands an estimated 75% profit margin on high-end processors (per the source). That gap is the engine.

Last year, keeping ChatGPT operational cost OpenAI $8.4 billion. With 900 million weekly users, 2024 operational costs are projected to reach $14 billion. Over the next eight years, OpenAI has committed approximately $1.4 trillion to computing power against $25 billion in current annual revenue. That ratio is unsustainable without either dramatic revenue growth or unit-cost reduction.

Building custom silicon solves the second equation. Lower inference costs feed a flywheel: cheaper serving enables better product response and UX, which drives user volume and revenue, which funds the next iteration of infrastructure. This is the same playbook Google deployed with TPUs starting in 2015 and now controls roughly a quarter of global AI computing capacity outside Nvidia's supply chain. Amazon has shipped over one million custom chips. Meta and Microsoft continue scaling proprietary hardware.

OpenAI entered this race late. It compressed the design cycle by using its own language models to automate and optimize portions of the hardware design process. The vertical integration also means OpenAI can optimize the entire stack—chip architecture, software kernels, memory systems, network scheduling, and application logic—around its own model roadmaps, similar to how Apple couples proprietary hardware and iOS.

Plan for custom silicon in your vendor mix

Jalapeño is not a consumer product. It is enterprise infrastructure for OpenAI's own data centers. However, its existence signals that inference margin compression is now a competitive feature, not a nice-to-have.

If you operate large-scale LLM serving infrastructure, expect custom silicon announcements from other major cloud and model providers within 18 months. Broadcom's involvement and the nine-month tape-out timeline suggest the barrier to entry is now speed and talent, not capital or process node access. Vendors will begin offering Tomahawk-integrated or similar networking stacks in their own ASICs by 2027.

Cost your inference workloads against both Nvidia's published pricing and the expected unit economics of custom silicon with 2–3 year deployment windows. Lock multi-year GPU contracts only if the commitment includes escape clauses for custom-silicon migration paths.

#LLM#Enterprise AI#GPT
Share:
Keep reading

Related stories