VoiceDesk

Live in production · Founder & full-stack AI engineer · voicedesk.app ↗

A multi-tenant SaaS that turns a company's documents into a live, always-on AI voice assistant. It answers customer calls - by phone or web - in 1–3 seconds, 24/7, with sub-three-second responses from the company's actual knowledge base.

"Never miss a customer call."

1–3s

response latency

<2s p95

KB load at call start

100+

concurrent calls / sec

24/7

uptime, zero hold time

The problem

Businesses lose customers to voicemail. Existing voice bots are slow and dumb. Chatbots only work on a website. The AI is the easy part of fixing this - the hard part is real-time voice.

Most "AI projects" in 2025 are a chat wrapper around a hosted LLM endpoint. VoiceDesk isn't that. It's a real-time, full-duplex voice pipeline running between a customer's browser (or phone) and a real-time voice server, with NAT traversal that actually works on mobile networks, an SDP proxy that injects per-tenant configuration into the handshake, and a pre-loaded-context architecture that beats RAG on latency.

Architecture

Phone via a telephony provider. Browser via WebRTC. The application backend stitches it together.

Customer │ voice ▼ ┌─────────────────┐ ┌──────────────────────────┐ │ Phone │────────▶│ │ └─────────────────┘ SIP │ Real-time Voice Server │ │ ───────────────────── │ ┌─────────────────┐ │ • Speech-aware LLM │ │ Browser (WebRTC)│◀───────▶│ (STT + reasoning) │◀──┐ └────────┬────────┘ RTC │ • TTS (self + premium) │ │ │ SDP └──────────────────────────┘ │ │ offer │ knowledge ▼ │ + config ┌──────────────────────┐ │ │ Application Backend │ per-tenant system prompt, │ │ ───────────────── │ voice, persona, greeting, │ │ • SDP proxy │───temperature, knowledge base ──────┘ │ • TURN/ICE service │ │ • Call analytics │ ┌──────────────────┐ │ • Multi-tenant API │──▶│ Postgres + Redis │ └──────────────────────┘ └──────────────────┘

Engineering substance

The hard problem wasn't the AI. The hard problem was real-time voice.

Full-duplex WebRTC, browser & phone

Audio in both directions between the customer's browser/phone and a real-time voice server. The two transports look very different on the wire - phone via SIP through a telephony provider, browser via raw WebRTC - and converge on the same intelligence loop.
NAT traversal via a dedicated TURN service

TURN/STUN credentials served on-demand so calls actually work on mobile networks and behind corporate firewalls. The thing most "WebRTC demos" gloss over.
Custom SDP offer/answer proxy

In the application backend: injects per-tenant configuration - system prompt, voice, persona, greeting, temperature - into the handshake. The browser never sees the AI configuration, so it can't be tampered with.
No-trickle-ICE handling

The real-time voice backend doesn't support trickle ICE. The client gathers all candidates with a 3-second timeout and a "relay-ready" shortcut before sending the offer - without this, calls would either be slow to connect or fail outright on certain network topologies.
Pre-loaded knowledge - not mid-call RAG

Instead of doing retrieval on every turn, the entire company knowledge base is loaded into the AI's context at call-start (≈2s, p95). Result: 1–3s response latency for the whole call, vs. 5–8s for typical retrieval-per-turn approaches. Cache hit rate >80% on knowledge loads.
RTVI data channel with structured-tag protocol

LLM tokens stream over a data channel alongside the audio. The system prompt asks the LLM to emit hidden <filter_update> or <form_update> JSON before its spoken response. The frontend parses them out of the token stream and uses them to drive UI state - filtering a gallery, auto-filling a form - while the AI is talking.
Multi-tenant call analytics

Every call recorded, transcribed, scored for confidence, aggregated into daily dashboards. Voice library switchable per-tenant - self-hosted TTS for standard voices, premium TTS for the high-end.
Embeddable widgets

Button, banner, floating styles - all WebRTC-based, drop into any site with one script tag.

Latency budget

Where the seconds go.

~500ms

STT

~1000ms

LLM

~500ms

TTS

Stack

BackendPython 3.12

DatabasePostgreSQL 17

Cache / sessionsRedis 7

Real-time voiceWebRTC + custom voice server

STT + LLMSpeech-aware LLM

TTSSelf-hosted + premium

TelephonyTelephony provider (numbers + TURN)

FrontendVanilla JS + utility-first CSS + Web Audio API

AuthOAuth (social login)

DeployContainerised · production WSGI · nginx · tunnel ingress

ErrorsError tracking + alerting

AutomationHeadless-browser capture (wizard)

The wow moments

The live demo wizard

Split-screen voice call where the AI interviews a prospect and the form populates in real time as they speak. The single best "this is real AI engineering" moment on the live site.

Voice-driven on-page filtering

Speak into the mic and the demo grid filters itself based on what you say. Powered by the <filter_update> tag protocol embedded in the LLM's response stream.

Industry demo cards

Pre-built voice demos for healthcare, legal, property, e-commerce, hospitality, financial services, hair & beauty, automotive, government, education, IT/SaaS - each a real working call you can place.

"WebRTC for the browser. Telephony for the phone. A speech-aware LLM for the brain. Premium synthetic voice for the output. The application backend stitches it all together."

Next: Brimley → Get in touch