OpenAI and Broadcom build Jalapeño, a custom inference chip for LLMs

OpenAI and Broadcom partner on inference-focused silicon

OpenAI and Broadcom announced Jalapeño, a custom chip co-designed for large language model inference workloads. The chip is built to address performance, efficiency, and scale requirements across AI systems, according to OpenAI's announcement.

No performance benchmarks, power consumption figures, die size, production timeline, or pricing have been disclosed. The companies did not specify whether Jalapeño is intended for OpenAI's internal infrastructure, licensed to partners, or both.

The announcement does not detail manufacturing partners, yield targets, or deployment start dates. Broadcom is a semiconductor design and infrastructure company; OpenAI operates the largest inference load in generative AI.

Custom silicon is becoming mandatory for inference economics

Inference cost dominates operating margins in LLM services. A custom chip optimized for specific model architectures (attention mechanisms, token generation patterns, quantization schemes) can reduce per-inference compute cost and latency compared to general-purpose processors.

OpenAI's move mirrors patterns at Google (TPU), Amazon (Trainium/Inferentia), and Meta (MTIA). Building proprietary silicon reduces dependency on Nvidia and protects margin under commoditizing competition.

However, inference silicon historically ships on long lead times (18–36 months from tape-out to volume production). Until Jalapeño reaches production and customers deploy it in real workloads, it remains a strategic intention, not a capability.

For enterprise AI teams, this signals that inference hardware diversity is coming. Today, Nvidia owns >90% of the discrete GPU inference market. Custom chips will fragment that dominance over the next 2–3 years, forcing architects to qualify multiple silicon vendors.

Audit your inference cost model now

If your organization runs LLM inference at scale (>1B tokens/day), measure your cost per million tokens, your p95 latency, and your GPU utilization. Capture these numbers before new silicon options appear.

Do not assume Jalapeño will be available to you on a public timeline. Broadcom does not build consumer chips; OpenAI may reserve Jalapeño for internal use. Contingency: evaluate alternative inference engines (vLLM, ONNX Runtime, TensorRT-LLM) and multi-GPU frameworks now, so you can migrate quickly if a new silicon option becomes available.

If you are a Broadcom or OpenAI customer, escalate a direct question: will Jalapeño be available for external purchase, and if so, on what timeline and at what cost relative to current GPU inference?

OpenAI and Broadcom build Jalapeño, a custom inference chip for LLMs

Our Take

Why it matters

Do this week

OpenAI and Broadcom partner on inference-focused silicon

Custom silicon is becoming mandatory for inference economics

Audit your inference cost model now

Related stories

Jamendo sues Nvidia over AI training on unlicensed music

China's 360 Says It Built Tools to Match Anthropic's Mythos

Centari Tracks Deal Changes Across Amendments, Not Just Single Documents