Back to Blog
Tutorials13 min read

How to Build a Voice AI Agent from Scratch: Complete Tutorial

NURO UniversityMarch 3, 2026

How to Build a Voice AI Agent from Scratch

Building a voice AI agent from scratch in 2026 is far more accessible than it sounds. The pieces — LLM, text-to-speech, speech-to-text, and telephony — are all available as APIs. Your job is to wire them together intelligently. This tutorial walks you through the complete architecture.

The Architecture of a Voice AI Agent

A voice AI agent has five components working in real time:

  1. Telephony layer — Handles the phone call (Twilio, Retell's built-in, or VAPI)
  2. Speech-to-text (STT) — Converts spoken audio to text (Deepgram, Whisper)
  3. LLM (the brain) — Processes the text and generates a response (GPT-4o, Claude)
  4. Text-to-speech (TTS) — Converts the LLM response to audio (ElevenLabs, Cartesia)
  5. Orchestration — Manages the flow, timing, and data capture

The total latency from user speech ending to AI response beginning needs to be under 700ms for a natural conversation. Platform selection directly affects this.

Platform Options

Option A: Retell AI (Recommended for Beginners)

Retell handles STT, TTS, telephony, and orchestration. You bring your LLM key and a system prompt. Latency: 400-600ms. Best for: agencies building voice agents without deep technical expertise.

Option B: VAPI (Recommended for Advanced Builders)

VAPI is more configurable. You can swap every component (different STT, different TTS, custom LLM). Latency: 500-800ms depending on configuration. Best for: developers who need fine-grained control.

Option C: Full Custom Build

Use Twilio Media Streams + Deepgram + OpenAI + ElevenLabs directly. Maximum control, maximum complexity. Latency: 600-900ms (more overhead from orchestration). Best for: engineering teams with specific requirements.

For 90% of agency use cases, Retell AI is the right choice.

Step 1: Design the Conversation Flow

Before touching any platform, write out your conversation flow in plain English.

Sample: Auto Shop Appointment Booking Agent

Entry: "Thank you for calling [Shop Name]. This is Maya — how can I help you today?"

Intents to handle:

  • Schedule an appointment
  • Get pricing information
  • Check appointment status
  • Speak to a human

Appointment booking flow:

  1. Confirm they want to schedule
  2. Ask for vehicle year/make/model
  3. Ask what service they need
  4. Check available slots (from calendar API)
  5. Offer 2-3 options
  6. Confirm selection
  7. Ask for name and phone number
  8. Confirm appointment: "You're scheduled for [service] on [day] at [time]. You'll get a text confirmation shortly."

Escalation: "Let me get you connected with one of our service advisors — please hold just a moment."

Document this before building. The conversation design is 60% of the work.

Want to build this yourself? NURO University walks you through it step by step. Start free →

Step 2: Choose and Configure Your Voice

Voice selection dramatically affects user perception. A mismatch between voice and brand destroys trust.

ElevenLabs Voice Selection Guide

Voice CharacteristicBest For
Warm, slow cadenceHealthcare, legal, wellness
Professional, neutralBusiness services, real estate
Energetic, friendlyRetail, hospitality, restaurants
AuthoritativeFinancial services, law enforcement

For most business use cases: start with ElevenLabs "Rachel" (professional, warm) or create a custom clone from the business owner's voice (powerful for brand alignment).

Custom voice cloning with ElevenLabs:

  1. Record 30-60 minutes of clean audio
  2. Upload to ElevenLabs Voice Lab
  3. Train (15-30 minutes)
  4. Use the voice ID in your Retell configuration

Step 3: Build the LLM System Prompt

The system prompt is the brain of your agent. It needs:

  1. Identity and role — Who are they, what can they do
  2. Knowledge base — Business hours, pricing, services, FAQs
  3. Conversation rules — Tone, pacing, handling unclear input
  4. Fallback instructions — What to do when the agent cannot handle a request
  5. Data collection instructions — What information to gather and how to confirm it

Keep prompts under 2,000 tokens for best performance. Test with edge cases: angry callers, unclear requests, questions outside scope.

Step 4: Connect to Calendar/CRM

For appointment booking agents, real-time calendar access is essential. Otherwise the agent books slots that do not exist.

Integration options:

  • Google Calendar: Google Calendar API (free, relatively easy)
  • Acuity/Calendly: Webhook + API (straightforward)
  • Industry-specific software: Usually requires a middleware layer via Make.com or n8n

Build a Make.com workflow that:

  1. Receives a call from Retell via webhook
  2. At the "check availability" step, queries the calendar
  3. Returns available slots to the agent
  4. At booking confirmation, creates the calendar event
  5. Sends confirmation SMS to caller

Step 5: Test Rigorously

Before going live, run through at minimum:

  • 20 test calls covering normal scenarios
  • 5 calls where the user says something unexpected
  • 3 calls where the user wants to speak to a human
  • 2 calls where the user tries to schedule outside available hours
  • 1 call that gets disconnected mid-conversation

Listen to call recordings and read transcripts. Iterate on the prompt until the agent handles 85%+ of scenarios gracefully.

Deployment Checklist

  • Phone number acquired and assigned
  • Voice tested by 3+ people (not just you)
  • Calendar integration tested with real bookings
  • Webhook for post-call data capture working
  • SMS confirmation triggers on booking
  • Escalation path (human handoff) tested
  • Error monitoring set up (webhook alert on failed calls)
  • Client trained on how to review call logs

Ready to Build Your AI Automation Business?

Stop reading about AI automation — start building it. NURO University gives you the exact frameworks, templates, and step-by-step training to land your first client and scale to $10K/month.

Join NURO University Free →

No tech background required. Start seeing results in your first 30 days.

Ready to master AI automation?

Join NURO University and build real AI solutions in 12 structured modules. Start free today.

Start Learning

Get weekly AI automation tips

Join 2,400+ builders getting actionable AI strategies every Tuesday.

No spam. Unsubscribe anytime.