Skip to content

Overview

Testing & Quality Assurance

Comprehensive testing, evaluation, and quality assurance framework for ART Voice Agent Accelerator.


Testing Strategy Overview

The ART Voice Agent Accelerator provides a multi-layered testing strategy to ensure quality at every level:

flowchart TB subgraph Quality["Quality Assurance Layers"] direction TB Unit["Unit Tests<br/>Component isolation"] Integration["Integration Tests<br/>Cross-component flows"] Evaluation["Agent Evaluation<br/>LLM quality metrics"] Load["Load Testing<br/>Performance & scale"] end Unit --> Integration --> Evaluation --> Load subgraph Execution["Execution Methods"] Local["🖥️ Local<br/>pytest / CLI"] Pipeline["⚙️ CI/CD<br/>GitHub Actions"] Foundry["☁️ Azure AI Foundry<br/>Cloud evaluation"] end Evaluation --> Local Evaluation --> Pipeline Evaluation --> Foundry

Quick Navigation

  • Unit & Integration Tests


    Fast, isolated tests for core components and event handling

    Testing Framework

  • Agent Evaluation


    Measure LLM quality: tool precision, groundedness, latency

    Evaluation Framework

  • Load Testing


    WebSocket performance testing with Locust

    Load Testing


Test Categories at a Glance

Category Purpose Tools Run Time
Unit Tests Component isolation pytest Seconds
Integration Tests Cross-component flows pytest Seconds
Agent Evaluation LLM quality metrics Evaluation framework Minutes
Load Tests Performance at scale Locust Minutes-Hours

Evaluation Framework Highlights

The evaluation framework measures agent quality across multiple dimensions:

Metric Description
Precision Fraction of tool calls that were correct
Recall Fraction of expected tools that were called
Efficiency Avoidance of redundant tool calls
Metric Description
Groundedness Response accuracy against evidence
Verbosity Token usage and conciseness
Handoff Accuracy Correct agent routing
Metric Description
E2E Latency End-to-end response time
TTFT Time to first token
Cost Token usage and estimated USD

Running Tests

Quick Commands

# Unit tests
pytest tests/ -v

# Evaluation scenarios
pytest tests/evaluation/test_scenarios.py -v

# Load tests
make run_load_test_acs_media

Execution Methods

Run evaluations directly on your machine:

# Interactive CLI (recommended for exploration)
make eval

# Single scenario
make eval-run SCENARIO=tests/evaluation/scenarios/session_based/banking_multi_agent.yaml

# Via pytest
pytest tests/evaluation/test_scenarios.py -v

Run evaluations in GitHub Actions:

- name: Run Evaluation Tests
  run: pytest tests/evaluation/test_scenarios.py -v -m evaluation

Submit to cloud evaluation with additional AI-powered metrics:

pytest tests/evaluation/test_scenarios.py --submit-to-foundry

# Or via Python directly
python tests/evaluation/foundry_exporter.py \
    --data runs/my_run/foundry_eval.jsonl \
    --endpoint "$AZURE_AI_FOUNDRY_PROJECT_ENDPOINT"

Documentation Structure

docs/testing/
├── index.md        # This overview page
└── evaluation.md   # Evaluation framework guide

docs/operations/
├── testing.md      # Unit & integration tests
└── load-testing.md # Locust load testing

Getting Started Paths

I want to run unit tests : Start with Testing Frameworkpytest tests/ -v

I want to evaluate agent quality : Read Evaluation Framework

I want to load test my deployment : Follow Load Testing Guidemake run_load_test_acs_media

I want to create custom scenarios : Check scenario examples in Evaluation Framework