Mistral Shrinks Speech-to-Text Latency With Voxtral Transcribe 2

Key Takeaways

Mistral launches two multilingual speech-to-text models with low latency.
Speaker diarization, context biasing and word-level timestamps are now supported.
IT and contact center leaders gain faster, cheaper tools for automation and compliance.

Mistral AI's new Voxtral Transcribe 2 suite delivers streaming speech-to-text with sub-200ms latency and open weights, targeting enterprise voice agents and compliance workflows.

According to company officials, the February 4 release comprises Voxtral Mini Transcribe V2 for batch transcription and Voxtral Realtime for live applications. The models deliver state-of-the-art transcription quality across 13 languages.

Voxtral Realtime ships under the Apache 2.0 open-weights license, enabling edge deployment for privacy-sensitive applications. The company also launched an audio playground in Mistral Studio for instant transcription testing with diarization and timestamps.

Voxtral Transcribe 2 Feature Breakdown
Mistral's Run-Up to Voxtral
The Push Toward Real-Time Voice AI
Mistral AI at a Glance

Voxtral Transcribe 2 Feature Breakdown

Mistral claims the models offer significant performance and cost advantages over competitors. The new suite builds on large language models optimized for audio processing.

Below is a breakdown of the Voxtral Transcribe 2 models and supporting features, including latency targets, licensing and deployment considerations.

Model/Feature	Enterprise Impact
Voxtral Mini Transcribe V2	Batch model with ~4% word error rate at $0.003/minute
Voxtral Realtime	Live transcription with latency configurable to sub-200ms
Speaker diarization	Labels speakers with precise start/end times
Open weights	Voxtral Realtime available under Apache 2.0 license
GDPR/HIPAA compliance	Supports on-premise and private cloud deployments

Mistral's Run-Up to Voxtral

Between May 2025 and January 2026, Mistral AI solidified its position as Europe's leading open-weight AI provider through aggressive model expansion. The company released three open-weight models between May and July 2025 — Magistral Small, Voxtral Small and Devstral Medium — while upgrading Mistral Medium 3 with 128,000-token context.

In December 2025, Mistral launched Mistral 3, a multimodal series featuring Mistral Large 3 and edge-optimized Ministral 3 models. In January 2026, the company released Vibe 2.0, a terminal-native coding agent.

Financially, Mistral secured a €1.7 billion Series C at an €11.7 billion valuation in September 2025.

The Push Toward Real-Time Voice AI

Voice AI deployments increasingly rely on streaming speech-to-text architectures to power real-time agents and meeting intelligence, but technical barriers remain significant.

Modern voice AI platforms require sub-100ms latency to support natural conversation flow. ElevenLabs' infrastructure, in comparison, delivers low-latency voice capabilities through its ElevenAPI platform, reaching over one billion users through companies including Meta, Epic Games and Salesforce.

Mistral AI at a Glance

Founded in 2023, Mistral AI targets large enterprises and public-sector organizations with configurable, privacy-focused AI solutions. The company offers open-source foundation models for production-grade deployments across cloud, edge and on-premises environments.

Key Takeaways

Table of Contents

Voxtral Transcribe 2 Feature Breakdown

Mistral's Run-Up to Voxtral

The Push Toward Real-Time Voice AI

Mistral AI at a Glance