Back to news
NewsApril 12, 2026· 4 min read

Microsoft Launches MAI Foundation Models for Text, Voice, and Video

Microsoft AI unveils three new foundation models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — signaling its push to build a full multimodal AI stack.

By Agentic DailySource: TechCrunch

The Announcement

Microsoft AI has released three new foundation models that can generate text, voice, and video, marking the company's most significant push to compete directly with rival AI labs on model development rather than relying solely on its OpenAI partnership.

Model Details

  • MAI-Transcribe-1: Speech-to-text in 25 languages with near-human accuracy
  • MAI-Voice-1: Natural-sounding audio generation with emotion control
  • MAI-Image-2: Video generation from text prompts with temporal consistency

Strategic Significance

This release signals Microsoft's desire to reduce dependency on OpenAI for core AI capabilities, while maintaining their partnership for ChatGPT and enterprise products. The models are available through Azure AI Services.

#Microsoft#Foundation Models#Multimodal#Azure
Share:
Keep reading

Related stories