Mistral releases a new open-source model for speech generation

2 months ago 28

French AI institution Mistral released a caller open-source text-to-speech exemplary connected Thursday that tin beryllium utilized by dependable AI assistants oregon successful endeavor usage cases similar lawsuit support. The model, which lets enterprises physique dependable agents for income and lawsuit engagement, puts Mistral successful nonstop contention with the likes of ElevenLabs, Deepgram, and OpenAI.

The caller model, called Voxtral TTS, supports 9 languages, including English, French, German, Spanish, Dutch, Portuguese, Italian, Hindi, and Arabic.

“Our customers person been asking for a code model. So we built a small-sized code exemplary that tin acceptable connected a smartwatch, a smartphone, a laptop, oregon different borderline devices. The outgo of it is simply a fraction of thing other connected the market, but it offers state-of-the-art performance,” Pierre Stock, vp of subject operations astatine Mistral AI, told TechCrunch during a telephone interview.

Mistral said the caller exemplary tin accommodate a customized dependable with a illustration of little than 5 seconds, and besides seizure characteristics similar subtle accents, inflections, intonations, and irregularities successful the travel of speech. The model, based connected Ministral 3B, tin power betwixt languages easy without losing the characteristics of the voice, which is utile for usage cases similar dubbing oregon real-time translation. Stock said the institution wanted the exemplary to dependable quality and not robotic.

The exemplary has been built for real-time performance, according to the company. It has a time-to-first-audio (TTFA) — a measurement of erstwhile the exemplary starts ‘speaking’ aft receiving input — of 90ms for a 10-second illustration of 500 characters. The exemplary besides has a real-time origin (RTF) of 6x, which means it tin render a 10-second clip successful astir 1.6 seconds.

Earlier this year, Mistral launched a brace of transcription models, 1 for ample batch processing and the different for real-time usage cases with debased latency. With the caller code model, the institution is apt aiming to supply a afloat suite of dependable products to enterprises.

“We program to person an end-to-end level that tin grip multimodal streams of input, including audio, text, and representation and output arsenic well. The main payment of that is you get mode much accusation with an end-to-end agentic strategy that supports audio arsenic an input oregon output,” Stock said.

Techcrunch event

San Francisco, CA | October 13-15, 2026

Mistral’s positioning is that its unfastened root and customization spot volition assistance enterprises follow its dependable models implicit competitors, arsenic they tin tune it the mode they want.

Read Entire Article