Back to news
NewsApril 23, 2026· 3 min read

WebSockets Cut OpenAI API Latency 40% for Multi-Step AI Agents

OpenAI's new WebSocket implementation eliminates connection overhead between agent steps, delivering measurable speed gains for complex workflows.

By Agentic DailyVerified Source: OpenAI

Our Take

Solid engineering improvement that addresses a real bottleneck—persistent connections and caching are proven optimizations being properly applied to agent workflows.

WebSockets Eliminate Connection Bottlenecks

OpenAI's latest optimization to their Responses API demonstrates how WebSocket connections can dramatically reduce latency in multi-step agentic workflows. By maintaining persistent connections instead of establishing new HTTP requests for each API call, the company achieved significant performance improvements in their internal Codex agent testing.

The breakthrough centers on eliminating the overhead of repeated connection establishment. Traditional REST API calls require a new handshake for each request, adding 100-300ms of latency per interaction. For agents making dozens of sequential calls—analyzing code, generating solutions, and refining outputs—this overhead compounds quickly.

Connection-Scoped Caching Amplifies Gains

Beyond persistent connections, OpenAI introduced connection-scoped caching that maintains context and intermediate results throughout an agent's workflow session. This approach reduces redundant processing when agents reference previous steps or iterate on solutions.

The caching mechanism works by:

  • Storing tokenized inputs and model states within the WebSocket session
  • Reusing computed embeddings for repeated context
  • Maintaining conversation history without retransmission
  • Preserving function call results across workflow steps

Real-World Performance Metrics

OpenAI's internal testing with Codex agents showed measurable improvements across key metrics. End-to-end workflow completion time decreased by an average of 40% for complex coding tasks involving multiple iterations. Token processing efficiency improved by 25% due to reduced context retransmission.

Practical Implications for Developers

These optimizations matter most for applications running complex, multi-step agent workflows. Customer service bots handling escalated issues, code review agents processing large repositories, and research agents synthesizing multiple sources will see the biggest benefits.

Developers should consider WebSocket implementations when their agents typically make more than five sequential API calls or when maintaining context across interactions is crucial for quality outcomes.

Implementation Considerations

While WebSockets offer clear performance advantages, they introduce complexity around connection management and error handling. Applications need robust reconnection logic and graceful degradation to HTTP fallbacks when connections fail.

The connection-scoped caching also requires careful memory management to prevent session bloat during long-running agent workflows. Teams should implement appropriate cache invalidation and session timeout policies.

#Agents#Developer Tools#LLM
Share:
Keep reading

Related stories