OpenAI ships three voice models with reasoning and live translation

OpenAI launches three specialized voice models

OpenAI released GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper through its Realtime API. GPT-Realtime-2 adds reasoning capabilities to voice interactions, scoring 15.2% higher on Big Bench Audio compared to GPT-Realtime-1.5 (per OpenAI benchmarks). The model can now handle parallel tool calls while maintaining conversation flow and supports adjustable reasoning levels from minimal to extra-high.

GPT-Realtime-Translate handles live speech translation across 70+ input languages into 13 output languages, maintaining conversation pace. GPT-Realtime-Whisper provides streaming speech-to-text transcription as speakers talk, rather than waiting for complete utterances.

Zillow reported a 26-point improvement in call success rates during testing (95% vs 69% after prompt optimization). The models include safety guardrails with active classifiers monitoring sessions for policy violations.

Voice agents can now reason while talking

Previous voice models handled simple back-and-forth but broke down during complex, multi-step requests. GPT-Realtime-2 addresses this by maintaining context through tool calls and interruptions while providing transparency about what it's doing through preambles like "let me check that."

The context window expansion from 32K to 128K tokens enables longer conversations without losing track of earlier requests. This matters for enterprise use cases where voice agents need to handle complex workflows rather than just answer questions.

Live translation removes the delay that made cross-language voice interactions feel stilted. BolnaAI reported 12.5% lower word error rates across Hindi, Tamil, and Telugu compared to other models they tested (company-reported).

Production voice apps become viable

GPT-Realtime-2 costs $32 per million audio input tokens and $64 per million output tokens, with cached input at $0.40 per million tokens. GPT-Realtime-Translate runs $0.034 per minute, while GPT-Realtime-Whisper costs $0.017 per minute.

The models support three emerging patterns: voice-to-action for task completion, systems-to-voice for proactive guidance, and voice-to-voice for cross-language conversations. Companies like Priceline are building end-to-end travel management through voice interactions.

Developers must disclose AI interaction to users unless obvious from context. The Agents SDK allows custom safety guardrails beyond OpenAI's built-in protections. All three models are available immediately through the Realtime API.

OpenAI ships three voice models with reasoning and live translation

Our Take

Why it matters

Do this week

OpenAI launches three specialized voice models

Voice agents can now reason while talking

Production voice apps become viable

Related stories

Gresham and FundGuard merge data platforms for asset managers

ANNA Money adds 3.66% savings account for UK small businesses

Payward buys Reap for $600M to merge stablecoin cards with B2B rails