Read all about the latest Ultravox release here

Read all about the latest Ultravox release here

Read all about the latest Ultravox release here

Build AI voice agents that communicate like we do

Cutting-edge AI speech for 5¢ per minute. Create and deploy highly effective and natural Voice Agents in no time.

TRY IT OUT

ver 0.41

Learn more about Ultravox by talking to it.

or call 1 844-741-5700

The future of AI speech is here

Ultravox is an open-weight Speech Language Model (SLM) trained to understand speech naturally, just like humans.

Beyond Speech Recognition

Ultravox is an advanced LLM that processes speech directly, without conversion to text. This enables much more natural and fluid conversations.

Web or VoIP
Ready

Web or VoIP Ready

Web or VoIP Ready

Seamlessly integrate Ultravox into your web, native app, or phone-based products with minimal effort. It comes with SDKs for all major languages and built-in Twilio support.

Multi-lingual by default

Ultravox is fluent in all major languages, and easily adaptable support new languages or accents, ensuring smooth communication across diverse audiences.

BYOM (Bring Your Own Model)

Ultravox gives you the flexibility to work with any open-source model, even your own fine-tuned models.

Fast, accurate, smart. Pick three

Unlike other voice-based systems, Ultravox integrates speech recognition directly, without relying on transforming speech into text.  This makes Ultravox faster, more reliable, and more natural.

Ultravox

Understanding speech directly means there are fewer moving parts. This means much faster and much more consistent response times than the Legacy Component System.

Legacy Component Systems

The current industry standard is a cascaded pipeline of services strung together to give the illusion of a seamless experience. This means it's slower, more brittle, and unable to capture the nuances of human speech.

BENCHMARKS

CoVoST2 Translation

Our primary method of evaluation is zero-shot speech translation, measured by BLEU, as a proxy or general instruction-following capability (the higher the number the better)

Ultravox 0.4.1 Llama 3.1 70B

38.97

GPT-4o Realtime

40.35

Ultravox 0.4.1 Llama 3.1 8B

33.97

Ultravox 0.4.1 Mistral NeMo (12B)

33.59

Qwen2-Audio 

- 7B-Instruct

28.43

ICTNLP Llama-Omni Llama 3.1 8B

6.61

Customize it, then run it anywhere

(even on-prem)

Whether it's adding support for additional languages, fine-tuning on your own datasets, or creating unique and custom voices — Ultravox can be fully customized to your needs.

Ultravox can also be deployed directly in your own cloud.

All the basics, plus some

for just 5¢ /minute

Function Calling

Fine-tunable

Interruptions

Custom Voices

Voice Cloning

RAG Support

Works with existing text-based prompts

Multi-lingual

High quality speech

People are noticing

They can't stop saying nice things about us *blushes*

Joe Heitzeberg

@jheitzeb

Wow! Ultravox is an *open source* speech to speech model — understands non-textual speech elements — paralinguistic information. @juberti just showed how it can pick up on tone, pauses, and more! @AITinkerers Seattle @FixieAI

bharat

@that_anokha_boy

ultravox is prolly most underrated project yall should checkout. i checked sarvam's shuka's code that is also inspired by ultravox.

Simon Willison

@simonw

I just spent some time with the voice demo of Ultravox at https://ai.town/ultravox and it really impressed me - openly licensed multi-modal audio model (like GPT-4o) based on Llama 3, and you can talk to it in your browser

Get in touch

We'd love to learn more about your use case and how we can help

Prefer direct email? We're here:

hello@fixie.ai