👋 Hey, I’m Ivan. I write a newsletter about startups and investing. I share market maps, playbooks and tactical resources for founders surfing tech waves.
🌊 AI Voice Agents
2025 is going to go down as the year of Agents, but also Voice AI.
We’ve been digging into voice and done a few investments in the space (HappyRobot, Rauda, Konvo, and Altan which just partnered with ElevenLabs). There’s a lot of emerging tech that makes this wave worth surfing.
So lets break down why now and how big this could be:
1. Voice is the most natural interface, and it’s (finally) working
The most human interface has been the most painful one to use. Until now.
Humans love to talk. It’s how we build trust, handle urgency, feel heard.
But voice tech has always been broken: clunky tech, dropped calls, bad bots, endless hold music, and a few other horror stories we’re all familiar with.
AI changed that. For the first time, machines can listen, think, and talk back. All in real-time. Simple to the user. Wildly complex under the hood.
But it unlocks real stuff:
SMBs miss over 60% of inbound calls → voice agents can pick up 24/7
Customers don’t want to wait → AI doesn’t sleep
Agents can now schedule, qualify, renew etc.
And this goes beyond call centers.
For enterprises: voice agents directly replace human labor, often 10x cheaper, faster, and more consistent.
For consumers: voice feels native. It’s faster than typing, more intuitive than apps, and perfect for things like coaching, language learning, or even companionship.
For builders: voice is now a wedge. Infra is maturing. What matters next is the application layer: the workflows, verticals, and GTM that ride on top.
Bottom line: voice isn’t just working, it’s working better.
And it’s likely about to change how we work, buy, and communicate.
2. We got here after 5 tech waves:
“Please press 1 for rage”
Wave 1 — IVR Hell (1970s–2000s): Rigid phone menus (“Press 1 for billing”) defined early voice tech. Still a $5B+ market, despite being universally hated.
Wave 2 — STT Gets Usable (2010s–2021): Speech-to-text finally worked well enough for business. Gong turned sales calls into structured data. Google’s STT APIs brought real use cases online.
Wave 3 — The Whisper Moment (2022): OpenAI open-sourced Whisper, pushing transcription toward human-level accuracy. Suddenly indie devs could build high-quality STT into apps, free.
Wave 4 — Voice 1.0 (2023–early 2024): Cascading stacks emerged:
Voice → Text → LLM → Text → Voice. ChatGPT + ElevenLabs made agents sound decent, but latency sucked. Brittle UX, long gaps, awkward timing.Wave 5 — Speech-Native (2024–2025): Speech-to-Speech flips the stack. GPT-4o handles voice input/output natively. 300ms latency, emotion, interruptions. Moshi runs locally, full-duplex. Hume adapts tone in real time.
3. A new Voice Stack makes applications viable
Better infra unlocked faster iteration.
Voice agents used to be a full-stack nightmare.
You had to wrangle real-time audio, transcription, latency, barge-ins, TTS quality, and orchestration logic, just to ship a mediocre demo.
But in the last 18 months, a new modular stack emerged. Infra finally caught up.
Now, each layer has best-in-class players:
Models → GPT‑4o (speech-native reasoning), ElevenLabs (TTS), Deepgram (fast STT), Moshi (open-source S2S)
Infra → Vapi, Retell, Hume, LiveKit handle orchestration, emotion, memory, interruptions.
Apps → Examples like Rauda (CS), Konvo (CX) and a big wave we’ll discuss next.
4. CS, Sales & Recruiting are leading the charge
These use cases are predictable, repetitive, and already voice-native.
You don’t need to convince anyone to “try voice” as it’s already how they operate:
Customer Support → Automate FAQs, renewals, triage, support tickets etc.
Sales → lead enrichment and lead gen, follow-ups, co-pilots etc.
Recruiting → Building pipeline, running interviews
With a stable infra and ripe jobs to be done, founders have started to go vertical:
Healthcare → Follow-ups, scheduling, insurance calls (i.e. HelloPatient)
Financial Services → Loan servicing, collections (i.e. Salient, Kastle)
SMBs → Booking, lead capture, customer follow-up (i.e. Goodcall, Numa)
5. YC’s latest batches are packed with Voice AI
Healthcare, Sales, HR, Retail Ops, and Productivity.
Spring 2025 batch was filled with application layer voice agent companies:
Kavana – AI sales rep for distributors.
Trapeze – AI-native Zocdoc, likely includes voice booking.
Novoflow – AI receptionist for clinics.
Lyra – Voice-aware Zoom for sales.
Nomi – Copilot that listens to sales calls.
Willow – Voice interface replacing your keyboard.
Atlog – Voice agents for retail stores.
SynthioLabs – Voice AI medical rep.
VoiceOS – Automated voice interviews for hiring.
6. The Market Map (2025) is growing fast
This is where the stack comes together. We mapped out 100+ companies across the Voice AI landscape — models, infra, and apps.
Here’s how to decode it (and what founders need to know):