MARS8 Family of TTS Models
MARS8 is a family of production-grade text-to-speech models built so every use case, language, and voice profile gets the same rock-solid reliability when millions are listening.
Live-Ready Voice vs Everything Else

.avif)








Four speech models, each tuned for a specific mission.

Contact centers
Live conversational AI

Audiobooks
Digital media

Precise prosody control
Creative editing workflows

Embedded devices
Edge deployments
Planet‑scale language coverage

.avif)
.avif)
.png)
.avif)
.avif)
.png)
.png)
Voice AI that moves you from demo‑ware to production realities.
Voice systems behave very differently at scale. Once latency budgets tighten, usage spikes, and compliance kicks in, architectural decisions start to dominate outcomes. MARS8 is built for these real‑world constraints, not for API convenience.




Percentage of characters that are incorrect in the generated output, as measured by Whisper ASR.
Speaker similarity metric measured as the mean cosine similarity between generated audio and reference audio, using the wavlm-base-sv embedding model.
Speaker similarity metric measured as the mean cosine similarity between generated audio and reference audio, using the CAM++ embedding model.
Approximate mean opinion score on a 1–10 scale, predicted by Meta’s Audiobox‑Aesthetics model; higher CE reflects greater content enjoyment.
Approximate mean opinion score on a 1–10 scale, predicted by Meta’s Audiobox‑Aesthetics model; higher PQ indicates better production quality.
Want the full technical breakdown?
For a detailed look at MARS8’s architecture, deployment patterns, and performance characteristics, read the full technical article on our blog.

on your terms
Whether you’re building a product or enabling others to build, there’s a direct path to get started.