Beyond the orchestrator: The evolution of Voice AI

In 2023, we set out to create a best-in-class voice AI platform, knowing that the most intelligent LLMs (and the infrastructure they ran on) were built for text-based conversation.

Our first attempt at a voice AI solution used bolted-on components to convert spoken audio into text, allow the LLM to run inference on this text, then convert the response back to voice output.

Unfortunately, this orchestrator approach produced conversations that were slow, awkward, and stilted–the experience was less “human-like” and more “uncanny valley”.

We asked ourselves what a better solution would look like, and started working on the project that would ultimately become the Ultravox platform.

Achieving real-time conversational latency at scale–without sacrificing model intelligence–required a fundamentally new, purpose-built approach to voice AI.

We know that LLMs are revolutionary, but their potential impact on the world is limited if they can only be used via text-based chat.

We are more than just a research lab. Believing in a better future doesn’t have to be at odds with pragmatism in the present day. Our team builds products and ships features that help teams adopt voice AI technology for real-world use cases.

Humans are voice-native, AI agents should be too.