Key Takeaways
- Mistral launches two multilingual speech-to-text models with low latency.
- Speaker diarization, context biasing and word-level timestamps are now supported.
- IT and contact center leaders gain faster, cheaper tools for automation and compliance.
Mistral AI's new Voxtral Transcribe 2 suite delivers streaming speech-to-text with sub-200ms latency and open weights, targeting enterprise voice agents and compliance workflows.
According to company officials, the February 4 release comprises Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. The models deliver state-of-the-art transcription quality across 13 languages.
Voxtral Realtime ships under the Apache 2.0 open-weights license, enabling edge deployment for privacy-sensitive applications. The company also launched an audio playground in Mistral Studio for instant transcription testing with diarization and timestamps.
Table of Contents
- Voxtral Transcribe 2 Feature Breakdown
- Mistral's Run-Up to Voxtral
- The Push Toward Real-Time Voice AI
- Mistral AI at a Glance
Voxtral Transcribe 2 Feature Breakdown
Mistral claims the models offer significant performance and cost advantages over competitors. The new suite builds on large language models optimized for audio processing.
Below is a breakdown of the Voxtral Transcribe 2 models and supporting features, including latency targets, licensing and deployment considerations.
| Model/Feature | Enterprise Impact |
|---|---|
| Voxtral Mini Transcribe V2 | Batch model with ~4% word error rate at $0.003/minute |
| Voxtral Realtime | Live transcription with latency configurable to sub-200ms |
| Speaker diarization | Labels speakers with precise start/end times |
| Open weights | Voxtral Realtime available under Apache 2.0 license |
| GDPR/HIPAA compliance | Supports on-premise and private cloud deployments |
Mistral's Run-Up to Voxtral
Between May 2025 and January 2026, Mistral AI solidified its position as Europe's leading open-weight AI provider through aggressive model expansion. The company released three open-weight models between May and July 2025 — Magistral Small, Voxtral Small and Devstral Medium — while upgrading Mistral Medium 3 with 128,000-token context.
In December 2025, Mistral launched Mistral 3, a multimodal series featuring Mistral Large 3 and edge-optimized Ministral 3 models. In January 2026, the company released Vibe 2.0, a terminal-native coding agent.
Financially, Mistral secured a €1.7 billion Series C at an €11.7 billion valuation in September 2025.
The Push Toward Real-Time Voice AI
Voice AI deployments increasingly rely on streaming speech-to-text architectures to power real-time agents and meeting intelligence, but technical barriers remain significant.
Modern voice AI platforms require sub-100ms latency to support natural conversation flow. ElevenLabs' infrastructure, in comparison, delivers low-latency voice capabilities through its ElevenAPI platform, reaching over one billion users through companies including Meta, Epic Games and Salesforce.
Mistral AI at a Glance
Founded in 2023, Mistral AI targets large enterprises and public-sector organizations with configurable, privacy-focused AI solutions. The company offers open-source foundation models for production-grade deployments across cloud, edge and on-premises environments.