Voice Debugging
Voice Debugging Guide¶
This guide helps you troubleshoot common voice issues. You'll learn:
- How to read voice logs
- Common problems and solutions
- Debugging tools and techniques
- Performance optimization
Quick Diagnostics¶
1. Check if Voice Handler Started¶
Look for these log messages:
✅ Good: "[abc12345] Speech cascade handler started"
✅ Good: "[abc12345] Speech recognizer started"
❌ Bad: "[abc12345] Failed to start recognizer: ..."
2. Check Audio Flow¶
✅ Good: "[abc12345] Partial speech: 'hello' (en-US)"
✅ Good: "[abc12345] Speech: 'check my balance' (en-US)"
❌ Bad: No "Partial speech" logs = audio not reaching STT
3. Check Agent Response¶
✅ Good: "[abc12345] Enqueued speech event type=final"
✅ Good: "[abc12345] TTS response processed: Your balance is..."
❌ Bad: "[abc12345] Orchestrator processing cancelled"
Common Issues¶
Issue: No Audio Recognition¶
Symptoms: No "Partial speech" or "Speech" logs
Checklist:
-
Verify audio is being sent:
-
Check Speech SDK initialization:
-
Verify audio format:
Solution:
# Ensure push_stream is created before audio arrives
if hasattr(self.recognizer, "push_stream") and self.recognizer.push_stream is None:
self.recognizer.create_push_stream()
Issue: Barge-In Not Working¶
Symptoms: Can't interrupt the assistant while speaking
Checklist:
-
Check barge-in suppression:
-
Check partial transcript threshold:
-
Verify TTS is cancelable:
Solution:
Issue: Handoff Fails¶
Symptoms: "No target agent configured for handoff tool: X"
Checklist:
-
Verify tool is registered:
-
Check handoff map:
-
Verify target agent exists:
Solution:
# In agent YAML, ensure handoff trigger matches tool name:
handoff:
trigger: handoff_fraud # Must match tool name exactly
Issue: Greeting Not Playing¶
Symptoms: Agent switches silently (when it shouldn't)
Checklist:
-
Check handoff type in scenario:
-
Check greeting template:
-
Look for greeting selection logs:
Solution:
Issue: High Latency¶
Symptoms: Long delay between user speech and assistant response
Debugging Steps:
-
Enable telemetry timing:
-
Check queue depth:
-
Profile each phase using telemetry:
- STT latency varies by utterance length and language
- LLM latency varies by prompt size and model
- TTS latency varies by text length (streaming reduces perceived latency)
Use the turn metrics emitted to Application Insights to measure actual latencies in your environment.
Solutions:
| Phase | Optimization |
|---|---|
| STT | Use semantic segmentation: use_semantic_segmentation=True |
| LLM | Use gpt-4o-mini for simple tasks |
| TTS | Ensure streaming TTS is enabled |
VoiceLive-Specific Issues¶
VoiceLive uses the OpenAI Realtime API, so debugging differs from Cascade.
Issue: VoiceLive Connection Fails¶
Symptoms: "Failed to connect to OpenAI Realtime API"
Checklist:
-
Verify deployment exists:
-
Check endpoint configuration:
-
Look for WebSocket errors:
Solution:
Issue: VoiceLive VAD Not Detecting Speech End¶
Symptoms: Agent waits too long after user stops talking
Context: VoiceLive uses server-side VAD (Voice Activity Detection) — you can't control it like Cascade.
What you CAN adjust:
# In voicelive handler, session config includes:
{
"turn_detection": {
"type": "server_vad",
"threshold": 0.5, # Sensitivity (0-1)
"prefix_padding_ms": 300, # Audio to keep before speech
"silence_duration_ms": 500 # How long silence = end of turn
}
}
Log to look for:
"[abc12345] input_audio_buffer.speech_started" # User started talking
"[abc12345] input_audio_buffer.speech_stopped" # User stopped
"[abc12345] conversation.item.input_audio_transcription.completed" # STT done
Issue: VoiceLive Tool Calls Not Working¶
Symptoms: Agent says "I'll check that" but tool never executes
Checklist:
-
Verify tools are in session config:
-
Check tool response format:
-
Look for tool call events:
Solution:
# Ensure tool execution sends result back:
await realtime_client.send({
"type": "conversation.item.create",
"item": {
"type": "function_call_output",
"call_id": tool_call_id,
"output": json.dumps(result)
}
})
await realtime_client.send({"type": "response.create"})
Issue: VoiceLive Audio Quality Issues¶
Symptoms: Choppy audio, echoes, or distortion
Checklist:
-
Check audio format conversion:
-
Look for buffer underruns:
-
Verify WebSocket throughput:
Solution: - Use Azure regions close to OpenAI endpoints - Ensure sufficient WebSocket buffer sizes - Consider Cascade mode for unreliable networks
Debugging Tools¶
1. Enable Debug Logging¶
# In your .env or environment:
LOG_LEVEL=DEBUG
# Or per-module:
import logging
logging.getLogger("voice.shared.handoff_service").setLevel(logging.DEBUG)
2. Use the REST API Test Client¶
# Test agent directly (no voice):
curl -X POST http://localhost:8000/api/v1/agents/FraudAgent/chat \
-H "Content-Type: application/json" \
-d '{"message": "I think my card was stolen"}'
3. Inspect Session State¶
# During a call, dump memory manager state:
from pprint import pprint
pprint(memo_manager.get_all_corememory())
4. Use OpenTelemetry Traces¶
# Start Jaeger for trace visualization:
docker run -d -p 16686:16686 -p 6831:6831/udp jaegertracing/all-in-one
# View traces at http://localhost:16686
Log Message Reference¶
Speech Recognition (Cascade)¶
| Log | Meaning |
|---|---|
Partial speech: 'X' |
Interim transcription (may change) |
Speech: 'X' |
Final transcription (sent to LLM) |
Speech error: X |
STT failed |
Barge-in skipped (suppressed) |
User spoke during handoff/greeting |
VoiceLive Events¶
| Log | Meaning |
|---|---|
session.created |
Realtime WebSocket connected |
input_audio_buffer.speech_started |
User started talking |
input_audio_buffer.speech_stopped |
Server VAD detected end |
conversation.item.input_audio_transcription.completed |
STT done |
response.audio.delta |
Audio chunk received from API |
response.function_call_arguments.done |
Tool call ready to execute |
response.done |
Turn completed |
Handoff (Both Modes)¶
| Log | Meaning |
|---|---|
Handoff resolved \| A → B |
Successful handoff routing |
Generic handoff denied |
Generic handoff not allowed |
Target agent 'X' not found |
Agent not in registry |
TTS (Cascade Only)¶
| Log | Meaning |
|---|---|
TTS response processed: X... |
Text sent to TTS |
Barge-in: cancelling TTS |
User interrupted |
Queue full, dropping PARTIAL |
System overloaded |
Performance Checklist¶
Before going to production:
Cascade Mode¶
- [ ] Use
gpt-4oorgpt-4o-mini(notgpt-4) - [ ] Enable streaming TTS
- [ ] Set appropriate
vad_silence_timeout_ms(800ms default) - [ ] Monitor queue depth (should stay < 3)
- [ ] Test with realistic audio (not just text input)
- [ ] Verify barge-in works end-to-end
- [ ] Check handoff greeting timing
VoiceLive Mode¶
- [ ] Use
gpt-4o-realtimedeployment - [ ] Verify WebSocket latency < 100ms to endpoint
- [ ] Test VAD sensitivity for your audio environment
- [ ] Validate tool execution round-trips
- [ ] Confirm audio format conversion works (16kHz ↔ 24kHz)
- [ ] Test interruption behavior (server-side VAD)
Getting Help¶
- Check logs first - 90% of issues appear in logs
- Reproduce minimally - Isolate the failing component
- Check this guide - Most issues are documented
- Ask in Teams/Slack - Share connection_id and logs
See Also¶
- Voice Architecture Overview - How voice works
- Voice Configuration Guide - Agent setup
- Telemetry Guide - Tracing setup