Build AI voice agents that communicate like we do
Cutting-edge AI speech for 5¢ per minute. Create and deploy highly effective and natural Voice Agents in no time.
Free to get started
TRY IT OUT
ver 0.41
Learn more about Ultravox by talking to it.
or call 1 844-741-5700
The future of AI speech is here
Ultravox is an open-weight Speech Language Model (SLM) trained to understand speech naturally, just like humans.
Beyond Speech Recognition
Ultravox is an advanced LLM that processes speech directly, without conversion to text. This enables much more natural and fluid conversations.
Seamlessly integrate Ultravox into your web, native app, or phone-based products with minimal effort. It comes with SDKs for all major languages and built-in Twilio support.
Multi-lingual by default
Ultravox is fluent in all major languages, and easily adaptable support new languages or accents, ensuring smooth communication across diverse audiences.
BYOM (Bring Your Own Model)
Ultravox gives you the flexibility to work with any open-source model, even your own fine-tuned models.
Fast, accurate, smart. Pick three
Unlike other voice-based systems, Ultravox integrates speech recognition directly, without relying on transforming speech into text. This makes Ultravox faster, more reliable, and more natural.
Ultravox
Understanding speech directly means there are fewer moving parts. This means much faster and much more consistent response times than the Legacy Component System.
Legacy Component Systems
The current industry standard is a cascaded pipeline of services strung together to give the illusion of a seamless experience. This means it's slower, more brittle, and unable to capture the nuances of human speech.
BENCHMARKS
CoVoST2 Translation
Our primary method of evaluation is zero-shot speech translation, measured by BLEU, as a proxy or general instruction-following capability (the higher the number the better)
Ultravox 0.4.1 Llama 3.1 70B
38.97
GPT-4o Realtime
40.35
Ultravox 0.4.1 Llama 3.1 8B
33.97
Ultravox 0.4.1 Mistral NeMo (12B)
33.59
Qwen2-Audio - 7B-Instruct
28.43
ICTNLP Llama-Omni Llama 3.1 8B
6.61
Customize it, then run it anywhere
(even on-prem)
Whether it's adding support for additional languages, fine-tuning on your own datasets, or creating unique and custom voices — Ultravox can be fully customized to your needs.
Ultravox can also be deployed directly in your own cloud.
All the basics, plus some
for just 5¢ /minute
Function Calling
Fine-tunable
Interruptions
Custom Voices
Voice Cloning
RAG Support
Works with existing text-based prompts
Multi-lingual
High quality speech
People are noticing
They can't stop saying nice things about us *blushes*
Joe Heitzeberg
@jheitzeb
Wow! Ultravox is an *open source* speech to speech model — understands non-textual speech elements — paralinguistic information. @juberti just showed how it can pick up on tone, pauses, and more! @AITinkerers Seattle @FixieAI
bharat
@that_anokha_boy
ultravox is prolly most underrated project yall should checkout. i checked sarvam's shuka's code that is also inspired by ultravox.
Simon Willison
@simonw
I just spent some time with the voice demo of Ultravox at https://ai.town/ultravox and it really impressed me - openly licensed multi-modal audio model (like GPT-4o) based on Llama 3, and you can talk to it in your browser
Get in touch
We'd love to learn more about your use case and how we can help
Prefer direct email? We're here:
hello@fixie.ai