Voice Settings
Voice Configuration Guide¶
This guide covers all configuration for voice agents:
- App Configuration — Centralized config management
- Local Overrides —
.env.localand.envfiles - Agent YAML — Voice, model, greeting settings
- Mode-Specific Config — Cascade vs VoiceLive
Looking for a specific config?
See the Complete Config Reference for all 60+ environment variables with descriptions and defaults.
Configuration Loading Order¶
The system loads configuration in this order (later sources override earlier):
| Source | Priority | Use Case |
|---|---|---|
.env |
Lowest | Project defaults |
.env.local |
Medium | Local development overrides |
| Environment variables | High | Container/cloud deployments |
| Azure App Configuration | Highest | Centralized enterprise config |
Local Override Rule
Keys defined in .env.local are never overwritten by Azure App Configuration. This lets you test locally with different values while the team shares a central config.
Azure App Configuration¶
For enterprise deployments, use Azure App Configuration to centralize settings across environments.
Setup¶
# Enable App Configuration
export AZURE_APPCONFIG_ENDPOINT="https://your-config.azconfig.io"
export AZURE_APPCONFIG_LABEL="dev" # or "staging", "prod"
Key Mapping¶
App Configuration uses hierarchical keys that map to environment variables:
| App Config Key | Environment Variable |
|---|---|
azure/openai/endpoint |
AZURE_OPENAI_ENDPOINT |
azure/openai/deployment-id |
AZURE_OPENAI_CHAT_DEPLOYMENT_ID |
azure/speech/endpoint |
AZURE_SPEECH_ENDPOINT |
azure/speech/region |
AZURE_SPEECH_REGION |
azure/acs/endpoint |
ACS_ENDPOINT |
azure/voicelive/endpoint |
AZURE_VOICELIVE_ENDPOINT |
azure/voicelive/model |
AZURE_VOICELIVE_MODEL |
app/pools/tts-size |
POOL_SIZE_TTS |
app/pools/stt-size |
POOL_SIZE_STT |
app/voice/default-tts-voice |
DEFAULT_TTS_VOICE |
See appconfig_provider.py for the complete mapping.
Feature Flags¶
App Configuration supports feature flags for toggles:
| Feature Flag | Environment Variable | Default |
|---|---|---|
warm-pool |
WARM_POOL_ENABLED |
true |
dtmf-validation |
DTMF_VALIDATION_ENABLED |
false |
auth-validation |
ENABLE_AUTH_VALIDATION |
false |
call-recording |
ENABLE_ACS_CALL_RECORDING |
false |
tracing |
ENABLE_TRACING |
true |
Upload Your Own Config¶
- Create resource: Azure Portal → App Configuration → Create
- Add keys: Configuration explorer → Create → Key-value
- Use labels: Set label to environment name (
dev,staging,prod) - Connect app: Set
AZURE_APPCONFIG_ENDPOINTin your deployment
Example: Adding a custom key
# Azure CLI
az appconfig kv set \
--name your-config \
--key "azure/openai/endpoint" \
--value "https://my-openai.openai.azure.com/" \
--label "dev"
Local Overrides¶
For local development, create .env.local to override any setting:
File Locations (searched in order)¶
apps/artagent/backend/.env.local— App-specific<project-root>/.env.local— Project-wide<project-root>/.env— Default fallback
Example .env.local¶
# apps/artagent/backend/.env.local
# Override Azure OpenAI for local testing
AZURE_OPENAI_ENDPOINT=https://my-dev-openai.openai.azure.com/
AZURE_OPENAI_CHAT_DEPLOYMENT_ID=gpt-4o-mini
# Use local Redis instead of Azure
REDIS_HOST=localhost
REDIS_PORT=6379
# Disable features for faster iteration
WARM_POOL_ENABLED=false
ENABLE_TRACING=false
# Override voice settings
DEFAULT_TTS_VOICE=en-US-AriaNeural
Override Behavior with App Config¶
When Azure App Configuration is enabled:
| Scenario | What Happens |
|---|---|
Key in .env.local + App Config |
.env.local wins |
| Key only in App Config | App Config value used |
Key only in .env.local |
.env.local value used |
| Key in neither | Uses code default |
This allows you to:
- Test with different models locally
- Use local Redis/Cosmos emulators
- Disable expensive features during development
- Keep secrets out of App Config during testing
Environment Variables Reference¶
Core Azure Services¶
# Azure OpenAI
AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/
AZURE_OPENAI_CHAT_DEPLOYMENT_ID=gpt-4o
AZURE_OPENAI_API_VERSION=2024-02-01
DEFAULT_TEMPERATURE=0.7
DEFAULT_MAX_TOKENS=500
# Azure Speech
AZURE_SPEECH_ENDPOINT=https://xxx.cognitiveservices.azure.com/
AZURE_SPEECH_REGION=eastus
AZURE_SPEECH_RESOURCE_ID=/subscriptions/.../speechServices/xxx
# Azure VoiceLive
AZURE_VOICELIVE_ENDPOINT=https://xxx.cognitiveservices.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime
# Azure Communication Services
ACS_ENDPOINT=https://xxx.communication.azure.com
ACS_CONNECTION_STRING=endpoint=https://...
ACS_SOURCE_PHONE_NUMBER=+1234567890
ACS_STREAMING_MODE=media # or "voice_live"
Voice Settings¶
# TTS defaults (fallback when agent config missing)
DEFAULT_TTS_VOICE=en-US-AriaNeural
DEFAULT_VOICE_STYLE=chat
DEFAULT_VOICE_RATE=+0%
# Audio processing
TTS_SAMPLE_RATE_ACS=16000
TTS_SAMPLE_RATE_UI=48000
SILENCE_DURATION_MS=1300
RECOGNIZED_LANGUAGE=en-US,es-ES,fr-FR
Pool & Performance¶
# Connection pools
POOL_SIZE_TTS=50
POOL_SIZE_STT=50
MAX_WEBSOCKET_CONNECTIONS=200
# Warm pools (pre-initialized for low latency)
WARM_POOL_ENABLED=true
WARM_POOL_TTS_SIZE=3
WARM_POOL_STT_SIZE=2
# Sessions
SESSION_TTL_SECONDS=1800
MAX_CONCURRENT_SESSIONS=1000
Agent YAML Structure¶
Every voice agent is defined in a YAML file under registries/agentstore/agents/:
# registries/agentstore/agents/my_agent.yaml
# ═══════════════════════════════════════════════════════════════════════
# IDENTITY
# ═══════════════════════════════════════════════════════════════════════
name: MyAgent # Unique identifier (PascalCase)
display_name: "My Assistant" # Shown in UI
description: "Handles customer inquiries"
# ═══════════════════════════════════════════════════════════════════════
# GREETINGS (Jinja2 templates)
# ═══════════════════════════════════════════════════════════════════════
greeting: |
Hi {{ caller_name | default('there') }}, welcome to {{ institution_name | default('our service') }}.
How can I help you today?
return_greeting: |
Welcome back! Is there anything else I can help with?
# ═══════════════════════════════════════════════════════════════════════
# HANDOFF CONFIGURATION
# ═══════════════════════════════════════════════════════════════════════
handoff:
trigger: handoff_my_agent # Tool name that routes TO this agent
is_entry_point: false # true = can be the starting agent
# ═══════════════════════════════════════════════════════════════════════
# MODEL CONFIGURATION
# ═══════════════════════════════════════════════════════════════════════
model:
deployment_id: gpt-4o # Azure OpenAI deployment
temperature: 0.7 # Creativity (0-1)
max_tokens: 2048 # Max response length
# Mode-specific models (optional)
voicelive_model:
deployment_id: gpt-realtime # For VoiceLive mode
temperature: 0.7
cascade_model:
deployment_id: gpt-4o # For Cascade mode
temperature: 0.8
# ═══════════════════════════════════════════════════════════════════════
# VOICE CONFIGURATION (Azure TTS)
# ═══════════════════════════════════════════════════════════════════════
voice:
name: en-US-AriaNeural # Azure TTS voice
rate: "-5%" # Speech rate (slower for clarity)
style: chat # Voice style
# ═══════════════════════════════════════════════════════════════════════
# TOOLS (referenced by name from toolstore)
# ═══════════════════════════════════════════════════════════════════════
tools:
- verify_client_identity
- check_account_balance
- handoff_concierge # Handoff tools route to other agents
Voice Settings Explained¶
Voice configuration differs between the two modes:
Cascade Mode (Azure TTS Voices)¶
Cascade uses Azure Neural Voices — 400+ options with styles and fine-grained control:
voice:
name: en-US-AriaNeural # Azure TTS voice
rate: "-5%" # Speech rate adjustment
style: chat # Emotional style
| Voice | Best For | Characteristics |
|---|---|---|
en-US-AriaNeural |
General | Friendly, clear |
en-US-JennyNeural |
Professional | Formal, crisp |
en-US-GuyNeural |
Male voice | Warm, approachable |
en-US-SaraNeural |
Casual | Upbeat, young |
Full list: Azure TTS Voice Gallery
Rate adjustment (recommended for phone clarity):
voice:
rate: "-10%" # 10% slower (best for telephony)
rate: "0%" # Normal speed
rate: "+10%" # 10% faster
Emotional styles (supported by some voices):
voice:
style: chat # Conversational
style: customerservice # Professional
style: empathetic # Supportive
style: cheerful # Upbeat
VoiceLive Mode (Azure VoiceLive)¶
VoiceLive uses Azure VoiceLive SDK with Azure Neural Voices. See VoiceLive customization docs for available voices and configuration options.
Note: VoiceLive voice configuration may differ from Cascade mode. Refer to the Azure VoiceLive documentation for current voice options and settings.
Greeting Templates¶
Greetings use Jinja2 templates with these variables:
| Variable | Source | Example |
|---|---|---|
caller_name |
Auth tool result | "John" |
client_id |
Session profile | "12345" |
institution_name |
Scenario config | "Contoso Bank" |
previous_agent |
Handoff context | "Concierge" |
handoff_context.reason |
Handoff tool args | "fraud inquiry" |
Example: Personalized Greeting¶
greeting: |
{% if caller_name %}
Hello {{ caller_name }}, I'm your personal banking assistant.
{% else %}
Hello! I'm your personal banking assistant.
{% endif %}
How can I help you today?
Example: Handoff-Aware Greeting¶
greeting: |
{% if previous_agent %}
Thanks for being transferred. I understand you need help with {{ handoff_context.reason | default('your inquiry') }}.
{% else %}
Welcome! I'm here to help with fraud-related concerns.
{% endif %}
What happened?
Handoff Configuration¶
Agent-Level Handoff¶
Define how other agents route TO this agent:
# fraud_agent.yaml
handoff:
trigger: handoff_fraud # Tool name
is_entry_point: false # Can't be starting agent
Scenario-Level Handoff¶
Define handoff behavior in scenario YAML:
# registries/scenariostore/scenarios/banking.yaml
handoffs:
- from_agent: Concierge
to_agent: FraudAgent
tool: handoff_fraud
type: announced # Play greeting on switch
share_context: true # Pass conversation history
- from_agent: FraudAgent
to_agent: Concierge
tool: handoff_concierge
type: discrete # Silent switch (no greeting)
share_context: false # Fresh context
Handoff Types¶
| Type | Greeting | Use When |
|---|---|---|
announced |
Yes | User should know they're being transferred |
discrete |
No | Seamless specialist routing |
Mode-Specific Configuration¶
The agent YAML supports separate model configs for each mode. The system picks the right one at runtime.
Cascade Mode¶
Uses separate Azure services: Speech SDK (STT) → Azure OpenAI (LLM) → Azure TTS
# Primary model config (used by cascade if cascade_model not specified)
model:
deployment_id: gpt-4o
temperature: 0.7
max_tokens: 2048
# Optional: cascade-specific override
cascade_model:
deployment_id: gpt-4o-mini # Use cheaper model for cascade
temperature: 0.5 # More deterministic
# Azure TTS voice (only used by cascade)
voice:
name: en-US-JennyNeural
rate: "-5%"
style: customerservice
VoiceLive Mode¶
Uses Azure VoiceLive SDK — audio-in/audio-out over a single WebSocket:
# Primary model config (fallback)
model:
deployment_id: gpt-4o
temperature: 0.7
# VoiceLive-specific config (required for realtime)
voicelive_model:
deployment_id: gpt-4o-realtime # Must be a realtime deployment
temperature: 0.7
voice: alloy # Voice selection
# Note: voice: block (Azure TTS) may be IGNORED in VoiceLive mode
# See Azure VoiceLive docs for current voice configuration options
Dual-Mode Agent Example¶
An agent that works in both modes:
name: BankingConcierge
display_name: "Banking Assistant"
# Cascade uses these
cascade_model:
deployment_id: gpt-4o
temperature: 0.7
max_tokens: 2048
voice:
name: en-US-AriaNeural
rate: "-5%"
style: chat
# VoiceLive uses these
voicelive_model:
deployment_id: gpt-4o-realtime
temperature: 0.7
voice: echo
# Shared
tools:
- verify_client_identity
- check_account_balance
Complete Example: Fraud Agent¶
# registries/agentstore/agents/fraud_agent.yaml
name: FraudAgent
display_name: "Fraud Specialist"
description: "Handles fraud detection and card security"
greeting: |
{% if caller_name %}
Hi {{ caller_name }}, I'm a fraud specialist at {{ institution_name | default('your bank') }}.
{% else %}
Hi, I'm a fraud specialist here to help.
{% endif %}
I understand you may have concerns about unauthorized activity. Can you tell me what happened?
return_greeting: |
Welcome back. Let's continue looking into your fraud concern.
handoff:
trigger: handoff_fraud
is_entry_point: false
model:
deployment_id: gpt-4o
temperature: 0.3 # Low creativity for security topics
max_tokens: 1024
voice:
name: en-US-JennyNeural # Professional voice
rate: "-5%"
style: empathetic # Supportive tone
tools:
- verify_client_identity
- check_recent_transactions
- block_card
- report_fraud
- handoff_concierge
Validating Your Configuration¶
After creating/editing an agent YAML:
# 1. Check YAML syntax
python -c "import yaml; yaml.safe_load(open('registries/agentstore/agents/my_agent.yaml'))"
# 2. Run agent discovery to validate
python -c "from apps.artagent.backend.registries.agentstore.loader import discover_agents; print(discover_agents().keys())"
# 3. Test greeting rendering
python -c "
from apps.artagent.backend.registries.agentstore.loader import discover_agents
agents = discover_agents()
print(agents['MyAgent'].render_greeting({'caller_name': 'Test User'}))
"
Programmatic Config Access¶
In code, always import from config.settings — never use os.getenv() directly:
# ✅ Correct - uses settings module
from config.settings import (
AZURE_OPENAI_ENDPOINT,
AZURE_SPEECH_REGION,
ACS_STREAMING_MODE,
)
# ❌ Wrong - bypasses config loading
import os
endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") # Don't do this
Why?¶
The settings module:
- Loads
.env.localand.envfiles - Connects to Azure App Configuration
- Applies key mappings and type conversions
- Respects override priorities
Checking Loaded Config¶
# Debug: see what config was loaded
from config.settings import get_loaded_config
config = get_loaded_config()
print(config)
See Also¶
- Voice Architecture Overview - How voice processing works
- Voice Debugging Guide - Troubleshooting issues
- Agent Reference - All available agents