Our Take
Solid engineering improvement that addresses a real bottleneck—persistent connections and caching are proven optimizations being properly applied to agent workflows.
WebSockets Eliminate Connection Bottlenecks
OpenAI's latest optimization to their Responses API demonstrates how WebSocket connections can dramatically reduce latency in multi-step agentic workflows. By maintaining persistent connections instead of establishing new HTTP requests for each API call, the company achieved significant performance improvements in their internal Codex agent testing.
The breakthrough centers on eliminating the overhead of repeated connection establishment. Traditional REST API calls require a new handshake for each request, adding 100-300ms of latency per interaction. For agents making dozens of sequential calls—analyzing code, generating solutions, and refining outputs—this overhead compounds quickly.
Connection-Scoped Caching Amplifies Gains
Beyond persistent connections, OpenAI introduced connection-scoped caching that maintains context and intermediate results throughout an agent's workflow session. This approach reduces redundant processing when agents reference previous steps or iterate on solutions.
The caching mechanism works by:
- Storing tokenized inputs and model states within the WebSocket session
- Reusing computed embeddings for repeated context
- Maintaining conversation history without retransmission
- Preserving function call results across workflow steps
Real-World Performance Metrics
OpenAI's internal testing with Codex agents showed measurable improvements across key metrics. End-to-end workflow completion time decreased by an average of 40% for complex coding tasks involving multiple iterations. Token processing efficiency improved by 25% due to reduced context retransmission.
Practical Implications for Developers
These optimizations matter most for applications running complex, multi-step agent workflows. Customer service bots handling escalated issues, code review agents processing large repositories, and research agents synthesizing multiple sources will see the biggest benefits.
Developers should consider WebSocket implementations when their agents typically make more than five sequential API calls or when maintaining context across interactions is crucial for quality outcomes.
Implementation Considerations
While WebSockets offer clear performance advantages, they introduce complexity around connection management and error handling. Applications need robust reconnection logic and graceful degradation to HTTP fallbacks when connections fail.
The connection-scoped caching also requires careful memory management to prevent session bloat during long-running agent workflows. Teams should implement appropriate cache invalidation and session timeout policies.