Voice Settings

Voice Configuration Guide¶

This guide covers all configuration for voice agents:

App Configuration — Centralized config management
Local Overrides — .env.local and .env files
Agent YAML — Voice, model, greeting settings
Mode-Specific Config — Cascade vs VoiceLive

Looking for a specific config?

See the Complete Config Reference for all 60+ environment variables with descriptions and defaults.

Configuration Loading Order¶

The system loads configuration in this order (later sources override earlier):

flowchart LR A[".env (project root)"] --> B[".env.local"] B --> C["Environment Variables"] C --> D["Azure App Configuration"] style A fill:#f5f5f5,stroke:#666 style B fill:#e6ffe6,stroke:#107C10 style C fill:#fff4ce,stroke:#FFB900 style D fill:#e6f3ff,stroke:#0078D4

Source	Priority	Use Case
`.env`	Lowest	Project defaults
`.env.local`	Medium	Local development overrides
Environment variables	High	Container/cloud deployments
Azure App Configuration	Highest	Centralized enterprise config

Local Override Rule

Keys defined in .env.local are never overwritten by Azure App Configuration. This lets you test locally with different values while the team shares a central config.

Azure App Configuration¶

For enterprise deployments, use Azure App Configuration to centralize settings across environments.

Setup¶

# Enable App Configuration
export AZURE_APPCONFIG_ENDPOINT="https://your-config.azconfig.io"
export AZURE_APPCONFIG_LABEL="dev"  # or "staging", "prod"

Key Mapping¶

App Configuration uses hierarchical keys that map to environment variables:

App Config Key	Environment Variable
`azure/openai/endpoint`	`AZURE_OPENAI_ENDPOINT`
`azure/openai/deployment-id`	`AZURE_OPENAI_CHAT_DEPLOYMENT_ID`
`azure/speech/endpoint`	`AZURE_SPEECH_ENDPOINT`
`azure/speech/region`	`AZURE_SPEECH_REGION`
`azure/acs/endpoint`	`ACS_ENDPOINT`
`azure/voicelive/endpoint`	`AZURE_VOICELIVE_ENDPOINT`
`azure/voicelive/model`	`AZURE_VOICELIVE_MODEL`
`app/pools/tts-size`	`POOL_SIZE_TTS`
`app/pools/stt-size`	`POOL_SIZE_STT`
`app/voice/default-tts-voice`	`DEFAULT_TTS_VOICE`

See appconfig_provider.py for the complete mapping.

Feature Flags¶

App Configuration supports feature flags for toggles:

Feature Flag	Environment Variable	Default
`warm-pool`	`WARM_POOL_ENABLED`	`true`
`dtmf-validation`	`DTMF_VALIDATION_ENABLED`	`false`
`auth-validation`	`ENABLE_AUTH_VALIDATION`	`false`
`call-recording`	`ENABLE_ACS_CALL_RECORDING`	`false`
`tracing`	`ENABLE_TRACING`	`true`

Upload Your Own Config¶

Create resource: Azure Portal → App Configuration → Create
Add keys: Configuration explorer → Create → Key-value
Use labels: Set label to environment name (dev, staging, prod)
Connect app: Set AZURE_APPCONFIG_ENDPOINT in your deployment

Example: Adding a custom key

# Azure CLI
az appconfig kv set \
  --name your-config \
  --key "azure/openai/endpoint" \
  --value "https://my-openai.openai.azure.com/" \
  --label "dev"

Local Overrides¶

For local development, create .env.local to override any setting:

File Locations (searched in order)¶

apps/artagent/backend/.env.local — App-specific
<project-root>/.env.local — Project-wide
<project-root>/.env — Default fallback

Example `.env.local`¶

# apps/artagent/backend/.env.local

# Override Azure OpenAI for local testing
AZURE_OPENAI_ENDPOINT=https://my-dev-openai.openai.azure.com/
AZURE_OPENAI_CHAT_DEPLOYMENT_ID=gpt-4o-mini

# Use local Redis instead of Azure
REDIS_HOST=localhost
REDIS_PORT=6379

# Disable features for faster iteration
WARM_POOL_ENABLED=false
ENABLE_TRACING=false

# Override voice settings
DEFAULT_TTS_VOICE=en-US-AriaNeural

Override Behavior with App Config¶

When Azure App Configuration is enabled:

Scenario	What Happens
Key in `.env.local` + App Config	`.env.local` wins
Key only in App Config	App Config value used
Key only in `.env.local`	`.env.local` value used
Key in neither	Uses code default

This allows you to:

Test with different models locally
Use local Redis/Cosmos emulators
Disable expensive features during development
Keep secrets out of App Config during testing

Environment Variables Reference¶

Core Azure Services¶

# Azure OpenAI
AZURE_OPENAI_ENDPOINT=https://xxx.openai.azure.com/
AZURE_OPENAI_CHAT_DEPLOYMENT_ID=gpt-4o
AZURE_OPENAI_API_VERSION=2024-02-01
DEFAULT_TEMPERATURE=0.7
DEFAULT_MAX_TOKENS=500

# Azure Speech
AZURE_SPEECH_ENDPOINT=https://xxx.cognitiveservices.azure.com/
AZURE_SPEECH_REGION=eastus
AZURE_SPEECH_RESOURCE_ID=/subscriptions/.../speechServices/xxx

# Azure VoiceLive
AZURE_VOICELIVE_ENDPOINT=https://xxx.cognitiveservices.azure.com/
AZURE_VOICELIVE_MODEL=gpt-4o-realtime

# Azure Communication Services
ACS_ENDPOINT=https://xxx.communication.azure.com
ACS_CONNECTION_STRING=endpoint=https://...
ACS_SOURCE_PHONE_NUMBER=+1234567890
ACS_STREAMING_MODE=media  # or "voice_live"

Voice Settings¶

# TTS defaults (fallback when agent config missing)
DEFAULT_TTS_VOICE=en-US-AriaNeural
DEFAULT_VOICE_STYLE=chat
DEFAULT_VOICE_RATE=+0%

# Audio processing
TTS_SAMPLE_RATE_ACS=16000
TTS_SAMPLE_RATE_UI=48000
SILENCE_DURATION_MS=1300
RECOGNIZED_LANGUAGE=en-US,es-ES,fr-FR

Pool & Performance¶

# Connection pools
POOL_SIZE_TTS=50
POOL_SIZE_STT=50
MAX_WEBSOCKET_CONNECTIONS=200

# Warm pools (pre-initialized for low latency)
WARM_POOL_ENABLED=true
WARM_POOL_TTS_SIZE=3
WARM_POOL_STT_SIZE=2

# Sessions
SESSION_TTL_SECONDS=1800
MAX_CONCURRENT_SESSIONS=1000

Agent YAML Structure¶

Every voice agent is defined in a YAML file under registries/agentstore/agents/:

# registries/agentstore/agents/my_agent.yaml

# ═══════════════════════════════════════════════════════════════════════
# IDENTITY
# ═══════════════════════════════════════════════════════════════════════
name: MyAgent                    # Unique identifier (PascalCase)
display_name: "My Assistant"     # Shown in UI
description: "Handles customer inquiries"

# ═══════════════════════════════════════════════════════════════════════
# GREETINGS (Jinja2 templates)
# ═══════════════════════════════════════════════════════════════════════
greeting: |
  Hi {{ caller_name | default('there') }}, welcome to {{ institution_name | default('our service') }}.
  How can I help you today?

return_greeting: |
  Welcome back! Is there anything else I can help with?

# ═══════════════════════════════════════════════════════════════════════
# HANDOFF CONFIGURATION
# ═══════════════════════════════════════════════════════════════════════
handoff:
  trigger: handoff_my_agent      # Tool name that routes TO this agent
  is_entry_point: false          # true = can be the starting agent

# ═══════════════════════════════════════════════════════════════════════
# MODEL CONFIGURATION
# ═══════════════════════════════════════════════════════════════════════
model:
  deployment_id: gpt-4o          # Azure OpenAI deployment
  temperature: 0.7               # Creativity (0-1)
  max_tokens: 2048               # Max response length

# Mode-specific models (optional)
voicelive_model:
  deployment_id: gpt-realtime    # For VoiceLive mode
  temperature: 0.7

cascade_model:
  deployment_id: gpt-4o          # For Cascade mode
  temperature: 0.8

# ═══════════════════════════════════════════════════════════════════════
# VOICE CONFIGURATION (Azure TTS)
# ═══════════════════════════════════════════════════════════════════════
voice:
  name: en-US-AriaNeural         # Azure TTS voice
  rate: "-5%"                    # Speech rate (slower for clarity)
  style: chat                    # Voice style

# ═══════════════════════════════════════════════════════════════════════
# TOOLS (referenced by name from toolstore)
# ═══════════════════════════════════════════════════════════════════════
tools:
  - verify_client_identity
  - check_account_balance
  - handoff_concierge            # Handoff tools route to other agents

Voice Settings Explained¶

Voice configuration differs between the two modes:

Cascade Mode (Azure TTS Voices)¶

Cascade uses Azure Neural Voices — 400+ options with styles and fine-grained control:

voice:
  name: en-US-AriaNeural         # Azure TTS voice
  rate: "-5%"                    # Speech rate adjustment
  style: chat                    # Emotional style

Voice	Best For	Characteristics
`en-US-AriaNeural`	General	Friendly, clear
`en-US-JennyNeural`	Professional	Formal, crisp
`en-US-GuyNeural`	Male voice	Warm, approachable
`en-US-SaraNeural`	Casual	Upbeat, young

Full list: Azure TTS Voice Gallery

Rate adjustment (recommended for phone clarity):

voice:
  rate: "-10%"   # 10% slower (best for telephony)
  rate: "0%"     # Normal speed
  rate: "+10%"   # 10% faster

Emotional styles (supported by some voices):

voice:
  style: chat           # Conversational
  style: customerservice # Professional
  style: empathetic     # Supportive
  style: cheerful       # Upbeat

VoiceLive Mode (Azure VoiceLive)¶

VoiceLive uses Azure VoiceLive SDK with Azure Neural Voices. See VoiceLive customization docs for available voices and configuration options.

voicelive_model:
  deployment_id: gpt-4o-realtime
  voice: alloy                   # Voice selection

Note: VoiceLive voice configuration may differ from Cascade mode. Refer to the Azure VoiceLive documentation for current voice options and settings.

Greeting Templates¶

Greetings use Jinja2 templates with these variables:

Variable	Source	Example
`caller_name`	Auth tool result	"John"
`client_id`	Session profile	"12345"
`institution_name`	Scenario config	"Contoso Bank"
`previous_agent`	Handoff context	"Concierge"
`handoff_context.reason`	Handoff tool args	"fraud inquiry"

Example: Personalized Greeting¶

greeting: |
  {% if caller_name %}
  Hello {{ caller_name }}, I'm your personal banking assistant.
  {% else %}
  Hello! I'm your personal banking assistant.
  {% endif %}
  How can I help you today?

Example: Handoff-Aware Greeting¶

greeting: |
  {% if previous_agent %}
  Thanks for being transferred. I understand you need help with {{ handoff_context.reason | default('your inquiry') }}.
  {% else %}
  Welcome! I'm here to help with fraud-related concerns.
  {% endif %}
  What happened?

Handoff Configuration¶

Agent-Level Handoff¶

Define how other agents route TO this agent:

# fraud_agent.yaml
handoff:
  trigger: handoff_fraud         # Tool name
  is_entry_point: false          # Can't be starting agent

Scenario-Level Handoff¶

Define handoff behavior in scenario YAML:

# registries/scenariostore/scenarios/banking.yaml
handoffs:
  - from_agent: Concierge
    to_agent: FraudAgent
    tool: handoff_fraud
    type: announced              # Play greeting on switch
    share_context: true          # Pass conversation history

  - from_agent: FraudAgent
    to_agent: Concierge
    tool: handoff_concierge
    type: discrete               # Silent switch (no greeting)
    share_context: false         # Fresh context

Handoff Types¶

Type	Greeting	Use When
`announced`	Yes	User should know they're being transferred
`discrete`	No	Seamless specialist routing

Mode-Specific Configuration¶

The agent YAML supports separate model configs for each mode. The system picks the right one at runtime.

Cascade Mode¶

Uses separate Azure services: Speech SDK (STT) → Azure OpenAI (LLM) → Azure TTS

# Primary model config (used by cascade if cascade_model not specified)
model:
  deployment_id: gpt-4o
  temperature: 0.7
  max_tokens: 2048

# Optional: cascade-specific override
cascade_model:
  deployment_id: gpt-4o-mini     # Use cheaper model for cascade
  temperature: 0.5               # More deterministic

# Azure TTS voice (only used by cascade)
voice:
  name: en-US-JennyNeural
  rate: "-5%"
  style: customerservice

VoiceLive Mode¶

Uses Azure VoiceLive SDK — audio-in/audio-out over a single WebSocket:

# Primary model config (fallback)
model:
  deployment_id: gpt-4o
  temperature: 0.7

# VoiceLive-specific config (required for realtime)
voicelive_model:
  deployment_id: gpt-4o-realtime # Must be a realtime deployment
  temperature: 0.7
  voice: alloy                   # Voice selection

# Note: voice: block (Azure TTS) may be IGNORED in VoiceLive mode
# See Azure VoiceLive docs for current voice configuration options

Dual-Mode Agent Example¶

An agent that works in both modes:

name: BankingConcierge
display_name: "Banking Assistant"

# Cascade uses these
cascade_model:
  deployment_id: gpt-4o
  temperature: 0.7
  max_tokens: 2048

voice:
  name: en-US-AriaNeural
  rate: "-5%"
  style: chat

# VoiceLive uses these
voicelive_model:
  deployment_id: gpt-4o-realtime
  temperature: 0.7
  voice: echo

# Shared
tools:
  - verify_client_identity
  - check_account_balance

Complete Example: Fraud Agent¶

# registries/agentstore/agents/fraud_agent.yaml

name: FraudAgent
display_name: "Fraud Specialist"
description: "Handles fraud detection and card security"

greeting: |
  {% if caller_name %}
  Hi {{ caller_name }}, I'm a fraud specialist at {{ institution_name | default('your bank') }}.
  {% else %}
  Hi, I'm a fraud specialist here to help.
  {% endif %}
  I understand you may have concerns about unauthorized activity. Can you tell me what happened?

return_greeting: |
  Welcome back. Let's continue looking into your fraud concern.

handoff:
  trigger: handoff_fraud
  is_entry_point: false

model:
  deployment_id: gpt-4o
  temperature: 0.3               # Low creativity for security topics
  max_tokens: 1024

voice:
  name: en-US-JennyNeural        # Professional voice
  rate: "-5%"
  style: empathetic              # Supportive tone

tools:
  - verify_client_identity
  - check_recent_transactions
  - block_card
  - report_fraud
  - handoff_concierge

Validating Your Configuration¶

After creating/editing an agent YAML:

# 1. Check YAML syntax
python -c "import yaml; yaml.safe_load(open('registries/agentstore/agents/my_agent.yaml'))"

# 2. Run agent discovery to validate
python -c "from apps.artagent.backend.registries.agentstore.loader import discover_agents; print(discover_agents().keys())"

# 3. Test greeting rendering
python -c "
from apps.artagent.backend.registries.agentstore.loader import discover_agents
agents = discover_agents()
print(agents['MyAgent'].render_greeting({'caller_name': 'Test User'}))
"

Programmatic Config Access¶

In code, always import from config.settings — never use os.getenv() directly:

# ✅ Correct - uses settings module
from config.settings import (
    AZURE_OPENAI_ENDPOINT,
    AZURE_SPEECH_REGION,
    ACS_STREAMING_MODE,
)

# ❌ Wrong - bypasses config loading
import os
endpoint = os.getenv("AZURE_OPENAI_ENDPOINT")  # Don't do this

Why?¶

The settings module:

Loads .env.local and .env files
Connects to Azure App Configuration
Applies key mappings and type conversions
Respects override priorities

Checking Loaded Config¶

# Debug: see what config was loaded
from config.settings import get_loaded_config
config = get_loaded_config()
print(config)