OpenAI Unveils Jalapeño to Cut LLM Inference Costs

Key Takeaways

Custom AI chip. OpenAI and Broadcom co-developed a processor purpose-built for LLM inference.
Accelerated development. The chip moved from design to tape-out in nine months.
Enterprise impact. Stronger performance per watt could lower inference costs, pending published benchmarks.

OpenAI and Broadcom on June 24 unveiled Jalapeño, a custom accelerator designed specifically for large language model (LLM) inference. The chip is the first in a multi-generation compute platform the two companies are building together, with Celestica contributing board, rack and system integration.

Engineering samples are running machine learning workloads at production target frequency and power, including GPT-5.3-Codex-Spark. According to OpenAI, early testing indicates performance per watt "substantially better than current state-of-the-art," though final benchmarks have not been published. Initial deployment is planned for Microsoft data centers by the end of 2026.

The Demand Problem Behind the Chip

Jalapeño is not OpenAI's answer to a supply problem. It is OpenAI's answer to a demand problem that hasn't fully arrived yet — but is coming fast.

At the inaugural Big Technology AI Summit in San Francisco last week, OpenAI President Greg Brockman put the compute picture in stark terms: the number of people actually using agents today is tiny. Most of OpenAI's reported billion users are still chatting. The agent era — where token consumption multiplies by orders of magnitude per task — hasn't truly begun.

Box CEO Aaron Levie illustrated the scale shift: early AI work might run 5,000 to 20,000 tokens per task. Deployed agents can burn through millions. "We're actually outrunning the efficiency improvements with our appetite," Levie said. That appetite is exactly what Jalapeño is being built to feed.

Brockman described a "compute-power economy" where every provider sells out all available compute, permanently. If that forecast holds, inference infrastructure — not training — becomes the strategic bottleneck. Jalapeño is OpenAI's bet that controlling that bottleneck in-house is worth the investment.

Nvidia Stays in the Picture — For Now

Jalapeño is an inference chip. That distinction matters. According to reporting from The Rundown AI, Nvidia remains the anchor for model training at OpenAI — Jalapeño does not displace that relationship. What it does is carve out the inference layer, where finished models are actually served to users, as OpenAI-controlled territory.

OpenAI is pushing toward 10 gigawatts of custom-chip-powered compute by 2029. To put that in perspective, at the Big Technology AI Summit it was noted that the current AI buildout — $700 billion in capex this year alone — is already bigger than cable and the railroads. And Anissa Gardizy of The Information cautioned that even that number is optimistic: roughly half of announced data centers never get built, and the real completion rate is likely worse.

Even a fraction of inference workloads shifting off Nvidia represents an enormous market. Lauren Goode of WIRED made the bull case for Nvidia at the summit, noting the company has pivoted correctly at every inflection point. But Gardizy flagged inference as the real opening for competition — and Jalapeño is now one of the competitors.

Our collaboration with OpenAI represents a fundamental commitment to scaling the physical infrastructure required for the next decade of AI. This is just the beginning of a multi-generation roadmap. By co-developing our industry-leading silicon directly with OpenAI, we are enabling the deployment of gigawatt scale data centers with Microsoft and other partners beginning in 2026.

- Hock Tan, President and CEO

Broadcom

Jalapeño Feature Breakdown

OpenAI describes Jalapeño as a blank-slate design for LLM inference rather than a repurposed general-purpose accelerator.

Capability	Description
LLM-optimized architecture	Built to reduce data movement and balance compute, memory and networking
Multi-generation platform	First step in a long-term roadmap with Broadcom and Celestica
Nine-month development cycle	Design to tape-out accelerated by OpenAI's own AI models
Broadcom networking integration	Uses Tomahawk networking silicon for large-scale deployment
Gigawatt-scale deployment	Planned for Microsoft data centers beginning late 2026

What Enterprises Should Watch

For enterprise technology leaders, the Jalapeño announcement is less about chip specs and more about what it signals for AI cost trajectories. OpenAI's full-stack ownership — models, products and now silicon — is designed to compress inference costs over time. Better performance per watt, at gigawatt scale, translates directly to cheaper API calls, faster agentic task completion, and more predictable infrastructure pricing.

PwC's Dallas Dolen, speaking at the Big Technology AI Summit, offered the clearest enterprise framework for what comes next: the winners won't be the biggest token spenders or the most frugal, but whoever "outcome-maxes" — building governance, routing and spend controls that match model capability to actual task requirements. As inference gets cheaper and agents get more capable, that discipline becomes the enterprise differentiator.

Jalapeño's first deployment is planned for Microsoft data centers by end of 2026. Enterprises running on Azure will likely feel the downstream effects before anyone else.

Recent OpenAI News

OpenAI has run one of the most aggressive enterprise AI expansion campaigns on record over the past 18 months, completing six acquisitions across collaboration, analytics, governance, security and agent deployment. Key deals include the $1.1 billion purchase of Statsig — whose founder joined as CTO of Applications — the sub-$400 million acquisition of Neptune for model governance, security specialist Promptfoo at $86 million, and Ona for persistent cloud environments in Codex.

On infrastructure, Nvidia committed up to $100 billion as primary chip supplier, AWS signed a $38 billion cloud contract, and the Microsoft relationship was restructured to preserve a $250 billion Azure commitment while granting both parties greater independence.

GPT-5 launched for all ChatGPT users in August 2025, followed by a ChatGPT App Store in December and GPT-5.4, featuring a one-million-token context window and native computer-use, in March 2026.

Learning Opportunities

WebinarJun 30, 2026 · 11:00 AM PDT

How Modern Marketing Is Exposing the Limits of Legacy CMS

WebinarJul 9, 2026 · 9:00 AM PDT

Why Some Dealers Are Pulling Ahead With AI

Prove the significant result not only in soccer

WebinarJul 14, 2026 · 9:00 AM PDT

Content Leaders Collective: Proving Content's Business Impact Starts With the Right CCMS

WebinarJul 30, 2026 · 11:00 AM PDT

From Automation to Intelligence: How Leading Teams Are Rethinking Operations

ConferenceAug 4, 2026 · 9:00 AM PDT

Ai4 2026

WebinarOn Demand

The Hidden Cost of Fragmented Customer Communication

Watch Now

WebinarOn Demand

From Legacy to Launch-Ready: How Gainbridge Made Its Website a Marketing-Led Growth Engine

Watch Now

WebinarOn Demand

Content Strategy Leaders Live: Managing Risk, Compliance & AI in Financial Services

Watch Now

View All

In April 2026, OpenAI closed a $122 billion private funding round — the largest in history — anchored by Amazon, Nvidia and SoftBank.

LLM Inference Hardware: Cost, Speed & Trade-Offs

Enterprises face growing pressure to cut AI inference costs and latency, driving interest in custom silicon, GPU cloud platforms and tiered deployment strategies.

Taalas built a chip that hard-wires LLM logic into silicon, claiming 10x faster inference at a fraction of GPU costs. UK-based startup Fractile recently raised $220 million to pursue a different approach that integrates memory within a standard server rack. Self-hosted deployments, meanwhile, often require expensive GPU instances plus engineering overhead for reliability and monitoring.

OpenAI, Broadcom Launch Jalapeño LLM Inference Chip

Key Takeaways

The Demand Problem Behind the Chip

Nvidia Stays in the Picture — For Now

Jalapeño Feature Breakdown

What Enterprises Should Watch

Recent OpenAI News

LLM Inference Hardware: Cost, Speed & Trade-Offs

About the Author