Model Architecture
We've extended Meta's Llama 3 model with a multimodal projector that converts audio directly into the high-dimensional space used by Llama 3. This direct coupling allows Ultravox to respond much more quickly than systems that combine separate ASR and LLM components. In the future this will also allow Ultravox to natively understand the paralinguistic cues of timing and emotion that are omnipresent in human speech.
One Size Doesn't Fit All
Llama
ULTRAVOX_Llama_8b
ULTRAVOX_Llama_70b
Mistral
ULTRAVOX_Mistral_8b
ULTRAVOX_Mistral_70b
More Soon!
We're working on even more sizes and extending model support. Check back soon for updates.
COMING SOON
FAQ
© 2024 Fixie
hello@fixie.ai