MCP Integration
MCP Integration Guide¶
-
What is MCP?
Model Context Protocol (MCP) is an open standard for connecting AI applications to external tools, resources, and context sources dynamically.
-
Context is Key
Agents are only as effective as the context they have. MCP provides a standardized way to deliver the right information at the right time.
-
Why for Voice?
Voice agents need instant access to context to respond naturally. MCP enables real-time context retrieval without code changes.
-
Dynamic Extension
Add new tools and data sources at runtime. Agents gain capabilities without restarts or redeployments.
Overview¶
MCP provides a standardized way to expose tools, resources, and prompts from external servers. In real-time voice orchestration, MCP allows you to dynamically extend agent capabilities while the system is running.
Key Components¶
MCP Client
: Manages connections to MCP servers
: Location: toolstore/mcp/client.py
MCP Adapter
: Bridges MCP tools to the unified registry
: Location: toolstore/mcp/adapter.py
Session Manager
: Handles MCP server lifecycle and health
: Location: toolstore/mcp/session_manager.py
Runtime API
: REST endpoints for dynamic server management
: Location: api/v1/endpoints/mcp.py
Context Management for Agents¶
Why Context Matters
In agentic systems, context is everything. An agent is only as effective as the information it has access to. MCP provides a standardized way to deliver the right context to agents at the right time—enabling them to make informed decisions and execute tasks successfully.
The Context Challenge¶
Voice agents face unique context challenges:
MCP Context Primitives¶
MCP provides three mechanisms for delivering context to agents:
| Primitive | Purpose | When to Use |
|---|---|---|
| Tools | Execute actions, retrieve dynamic data | Real-time lookups, transactions, API calls |
| Resources | Expose static or semi-static content | Documentation, policies, reference data |
| Prompts | Provide templated instructions | Specialized workflows, domain expertise |
Context Flow in Agentic Workflows¶
Consider a customer calling about a declined credit card:
Designing Context-Aware MCP Servers¶
Design Principle
Build MCP servers that anticipate what agents need. Return comprehensive context in single calls rather than requiring multiple round-trips.
Pattern: Rich Context Responses
Instead of minimal responses:
Return actionable context:
{
"code": "05",
"description": "Do not honor",
"category": "issuer_decline",
"common_causes": [
"Unusual transaction location",
"Velocity limit exceeded",
"Account restrictions"
],
"resolution_options": [
{"action": "verify_identity", "description": "Verify customer and retry"},
{"action": "contact_issuer", "description": "Escalate to issuing bank"},
{"action": "alternative_payment", "description": "Suggest different payment method"}
],
"agent_script": "I see your card was declined with a 'Do not honor' response...",
"requires_escalation": false
}
Context Strategies by Use Case¶
Key Context Needs:
- Account balances and transaction history
- Fraud detection signals
- Customer preferences and history
- Regulatory compliance rules
MCP Server Design:
Key Context Needs:
- Policy details and coverage
- Claims history
- Provider networks
- Regulatory requirements by state
MCP Server Design:
# insurance-context-mcp
tools:
- get_policy_context # Policy + coverage + exclusions
- get_claims_context # Claim status + history + documents
- get_provider_context # In-network providers + ratings
resources:
- coverage_explanations # Plain-language coverage docs
- state_regulations # State-specific rules
Key Context Needs:
- Product documentation
- Troubleshooting guides
- Customer interaction history
- Escalation procedures
MCP Server Design:
Context Window Management¶
LLM Context Limits
LLMs have finite context windows. MCP servers should return focused, relevant context rather than dumping everything available.
Strategies for Efficient Context:
- Summarize when possible - Return summaries for historical data
- Prioritize recency - Recent transactions matter more than old ones
- Filter by relevance - Only return context related to the current query
- Use pagination - Allow agents to request more if needed
# Example: Context-aware tool response
async def get_transaction_context(args: dict) -> dict:
account_id = args["account_id"]
query_intent = args.get("intent", "general") # What is agent trying to do?
# Fetch base data
transactions = await db.get_recent_transactions(account_id, limit=10)
# Enrich based on intent
if query_intent == "dispute":
# Include merchant details, dispute eligibility
transactions = enrich_for_dispute(transactions)
elif query_intent == "budget":
# Include category summaries
transactions = enrich_with_categories(transactions)
return {
"transactions": transactions,
"summary": generate_summary(transactions),
"suggested_actions": get_actions_for_intent(query_intent)
}
Tool Execution: Native vs MCP¶
The framework supports two complementary approaches for tool execution. Understanding when to use each is essential for building responsive voice applications.
Side-by-Side Comparison¶
Native tools are Python functions registered directly with the tool registry. They execute in-process with minimal overhead.
async def get_account_balance(args: dict) -> dict:
"""Native tool - executes in-process."""
client_id = args.get("client_id")
balance = await db.get_balance(client_id)
return {"success": True, "balance": balance}
register_tool("get_account_balance", schema, get_account_balance)
Advantages
| Aspect | Benefit |
|---|---|
| Latency | ~1-5ms overhead |
| Control | Full implementation control |
| Type Safety | Python static analysis |
Trade-offs
| Aspect | Limitation |
|---|---|
| Deployment | Requires code deployment for changes |
| Coupling | Tightly coupled to application |
MCP tools execute on remote servers via HTTP or SSE transport. They're discovered automatically at runtime.
{
"name": "lookup_decline_code",
"description": "Look up card decline code policy",
"input_schema": {
"type": "object",
"properties": {
"code": {"type": "string"}
}
}
}
Advantages
| Aspect | Benefit |
|---|---|
| Discovery | Tools appear without code changes |
| Polyglot | Servers can be any language |
| Independence | Deploy tools separately |
Trade-offs
| Aspect | Limitation |
|---|---|
| Latency | ~50-200ms network overhead |
| Reliability | Network/server failure modes |
Latency Considerations for Voice¶
Real-Time Constraint
Voice interactions target < 1 second end-to-end latency. Tool execution should stay under 200ms when possible. MCP adds network overhead that can impact perceived responsiveness.
Latency Comparison¶
| Scenario | Native Tool | MCP (HTTP) | MCP (SSE) | Voice Budget |
|---|---|---|---|---|
| Simple lookup | 2-5ms | 50-150ms | 30-100ms | |
| Database query | 10-50ms | 60-200ms | 40-150ms | |
| External API | 100-500ms | 150-600ms | 120-550ms | |
| Multiple chained | 50-100ms | 300-800ms | 200-600ms |
Decision Matrix¶
Use this guide to choose the right approach:
Recommended Architecture¶
Combine both approaches for optimal performance:
Startup Behavior: Deferred MCP Initialization¶
Non-Blocking MCP Startup
MCP server validation and tool registration runs as a deferred startup task. This means the application starts accepting HTTP requests immediately while MCP connections are established in the background.
Why Deferred?¶
Real-time voice applications prioritize fast startup. MCP server validation involves network calls that could delay the /health endpoint from responding, which would cause load balancers to mark the instance as unhealthy.
Readiness Endpoints¶
| Endpoint | Purpose | When to Use |
|---|---|---|
/api/v1/health |
Liveness check | Load balancer probes |
/api/v1/ready |
Full readiness including MCP | Traffic routing decisions |
/api/v1/readiness |
Comprehensive dependency check | Debugging, detailed status |
The /ready endpoint returns ready: true only when:
- deferred_startup_complete - All deferred tasks finished
- warmup_completed - OpenAI/Speech warmup done
- mcp_ready - MCP servers validated and tools registered
Required vs Optional MCP Servers¶
Configure MCP_REQUIRED_SERVERS to list servers that must be healthy:
MCP_ENABLED_SERVERS=cardapi,knowledge,analytics
MCP_REQUIRED_SERVERS=cardapi # Only cardapi is critical
- Required servers - Failures are logged as errors (but don't block startup)
- Optional servers - Failures are logged as warnings
Configuration¶
Environment Variables¶
Configure MCP servers using environment variables for automatic loading at startup.
# Enable/disable MCP server auto-loading
MCP_ENABLED_SERVERS=cardapi,knowledge
# Server-specific configuration
MCP_SERVER_CARDAPI_URL=http://cardapi-mcp:8080
MCP_SERVER_CARDAPI_TRANSPORT=sse
MCP_SERVER_CARDAPI_TIMEOUT=30
MCP_SERVER_KNOWLEDGE_URL=http://kb-mcp:8080
MCP_SERVER_KNOWLEDGE_TRANSPORT=http
Legacy CardAPI variables removed
Use only MCP_SERVER_* variables for CardAPI tooling.
- ✅ Set:
MCP_SERVER_CARDAPI_URL(and optionalMCP_SERVER_CARDAPI_TRANSPORT) - ❌ Do not use:
CARDAPI_URL,CARDAPI_MCP_URL(ignored by the backend)
Settings Reference¶
| Variable | Default | Description |
|---|---|---|
MCP_ENABLED_SERVERS |
"" |
Comma-separated list of servers to auto-load |
MCP_REQUIRED_SERVERS |
"" |
Comma-separated list of servers that must be healthy (errors logged but non-blocking) |
MCP_SERVER_{NAME}_URL |
— | Base URL for the MCP server |
MCP_SERVER_{NAME}_TRANSPORT |
streamable-http |
Transport type: streamable-http, sse, http, or stdio |
MCP_SERVER_{NAME}_TIMEOUT |
30 |
Connection timeout in seconds |
MCP_SERVER_{NAME}_AUTH_ENABLED |
false |
Whether EasyAuth authentication is enabled |
MCP_SERVER_{NAME}_APP_ID |
"" |
Azure AD App ID for EasyAuth token acquisition |
MCP_SERVER_TIMEOUT |
30 |
Global default timeout |
Agent YAML Configuration¶
Assign MCP tools to agents using the prefixed tool name format:
name: DeclineSpecialist
description: Specialist for card decline inquiries
model:
deployment_id: gpt-4o
temperature: 0.7
tools:
# Native tools (fast, in-process)
- verify_client_identity
- get_recent_transactions
# MCP tools (prefixed with server name)
- cardapi_lookup_decline_code # (1)!
- cardapi_search_decline_codes
voice:
current_voice: en-US-AndrewNeural
- MCP tools use the format
{server_name}_{tool_name}. Thecardapiprefix indicates this tool comes from thecardapiMCP server.
Runtime MCP Management¶
No Restart Required
The runtime API allows you to add, test, and remove MCP servers while the application is running. Newly registered tools become immediately available to agents.
API Endpoints Overview¶
| Endpoint | Method | Purpose |
|---|---|---|
/api/v1/mcp/servers |
GET | List all servers with status |
/api/v1/mcp/servers |
POST | Add and register a new server |
/api/v1/mcp/servers/test |
POST | Test connection without registering |
/api/v1/mcp/servers/{name} |
DELETE | Remove server and unregister tools |
/api/v1/mcp/tools |
GET | List all registered MCP tools |
For complete API documentation, see the MCP API Reference.
Quick Start Example¶
# Step 1: Test the connection
curl -X POST http://localhost:8000/api/v1/mcp/servers/test \
-H "Content-Type: application/json" \
-d '{"name": "cardapi", "url": "http://cardapi-mcp:8080"}' | jq
# Step 2: Register the server
curl -X POST http://localhost:8000/api/v1/mcp/servers \
-H "Content-Type: application/json" \
-d '{"name": "cardapi", "url": "http://cardapi-mcp:8080"}' | jq
import httpx
async def setup_mcp_server():
async with httpx.AsyncClient() as client:
# Step 1: Test the connection
response = await client.post(
"http://localhost:8000/api/v1/mcp/servers/test",
json={"name": "cardapi", "url": "http://cardapi-mcp:8080"}
)
test = response.json()
print(f"Found {test['tools_count']} tools")
# Step 2: Register if healthy
if test["connected"]:
response = await client.post(
"http://localhost:8000/api/v1/mcp/servers",
json={"name": "cardapi", "url": "http://cardapi-mcp:8080"}
)
print(response.json()["message"])
Frontend Integration¶
The Agent Builder UI includes a built-in MCP management panel:
+--------------------------------------------------+
| MCP Servers |
+--------------------------------------------------+
| [*] cardapi 4 tools [Remove] |
| http://cardapi-mcp:8080 |
| |
| [ ] knowledge (error) 0 tools [Remove] |
| Connection refused |
| |
| [+ Add MCP Server] |
+--------------------------------------------------+
Authentication Patterns¶
MCP servers often require authentication. The framework supports multiple methods.
Bearer Token¶
For servers accepting static API keys or service tokens.
response = await httpx.post(
"/api/v1/mcp/servers",
json={
"name": "secure-mcp",
"url": "https://api.example.com/mcp",
"auth_token": "sk-abc123..." # (1)!
}
)
- The token is automatically added as
Authorization: Bearer sk-abc123...header on all requests.
Custom Headers¶
For non-standard authentication schemes.
response = await httpx.post(
"/api/v1/mcp/servers",
json={
"name": "custom-auth-mcp",
"url": "https://api.example.com/mcp",
"headers": {
"X-API-Key": "abc123",
"X-Tenant-ID": "tenant-456"
}
}
)
OAuth 2.0 with PKCE¶
For servers requiring user-delegated authorization (On-Behalf-Of flows).
Complete OAuth Flow
Step 1: Initiate the flow
start_response = await httpx.post(
"/api/v1/mcp/oauth/start",
json={
"name": "enterprise-mcp",
"url": "https://mcp.enterprise.com",
"oauth": {
"client_id": "app-client-id",
"auth_url": "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/authorize",
"token_url": "https://login.microsoftonline.com/{tenant}/oauth2/v2.0/token",
"scope": "api://mcp-server/.default"
},
"redirect_uri": "https://yourapp.com/oauth/callback"
}
)
auth_url = start_response.json()["auth_url"]
# Redirect user to auth_url in popup or new tab
Step 2: Exchange the code
PKCE Security
The OAuth flow automatically generates and validates PKCE code verifiers using the S256 challenge method. No additional configuration required.
Transport Protocols¶
MCP supports multiple transport mechanisms, each suited to different use cases.
Comparison¶
| Transport | Best For | Connection | Latency |
|---|---|---|---|
| STREAMABLE_HTTP | Production deployments (recommended) | Per-request with streaming | Medium |
| SSE | Streaming, long-lived connections (legacy) | Persistent | Lower |
| HTTP | Simple requests, alias for streamable-http | Per-request | Medium |
| STDIO | Local CLI tools, development | Process | Lowest |
Per MCP Spec 2025-11-25
The streamable-http transport is now the recommended protocol for deployed MCP servers. SSE is still supported but considered legacy. The http type is an alias for streamable-http.
Streamable HTTP (Recommended)¶
Best for production deployments. Per MCP spec 2025-11-25, this is now the recommended transport.
SSE (Server-Sent Events) - Legacy¶
Still supported for streaming responses and long-lived connections.
HTTP Alias¶
The http transport type is an alias for streamable-http for backward compatibility.
STDIO (Local Process)¶
Development Only
STDIO transport spawns a local process and communicates via stdin/stdout. Use only for local development; production deployments should use HTTP or SSE.
Best Practices¶
Minimize MCP in Critical Path¶
# Avoid: Chaining multiple MCP calls
async def handle_inquiry(args):
code = await mcp.call("lookup_code", args) # +100ms
policy = await mcp.call("get_policy", code) # +100ms
script = await mcp.call("get_script", policy) # +100ms
return script # Total: 300ms+ overhead
# Better: Single comprehensive call
async def handle_inquiry(args):
# MCP server returns complete data in one call
result = await mcp.call("lookup_code_complete", args) # +100ms
return result
Cache Reference Data¶
from functools import lru_cache
from datetime import datetime, timedelta
class CachedMCPClient:
def __init__(self, session):
self.session = session
self._cache = {}
self._ttl = timedelta(minutes=5)
async def lookup_decline_code(self, code: str) -> dict:
cache_key = f"decline:{code}"
cached = self._cache.get(cache_key)
if cached and datetime.now() < cached["expires"]:
return cached["data"]
result = await self.session.call_tool(
"lookup_decline_code",
{"code": code}
)
self._cache[cache_key] = {
"data": result,
"expires": datetime.now() + self._ttl
}
return result
Set Aggressive Timeouts¶
# Voice path: fail fast
MCP_SERVER_TIMEOUT=5
# Background jobs: more lenient
MCP_SERVER_BATCH_TIMEOUT=30
Implement Graceful Degradation¶
async def get_decline_info(code: str) -> dict:
try:
return await mcp.call_tool("lookup_decline_code", {"code": code})
except TimeoutError:
logger.warning(f"MCP timeout for code {code}, using fallback")
return {
"success": True,
"message": f"Decline code {code} - please hold while I look up details.",
"partial": True
}
Monitor Performance¶
from opentelemetry import trace
tracer = trace.get_tracer(__name__)
async def call_mcp_traced(tool_name: str, args: dict) -> dict:
with tracer.start_as_current_span(
f"mcp.{tool_name}",
attributes={
"mcp.server": session.config.name,
"mcp.tool": tool_name,
}
) as span:
start = time.perf_counter()
try:
result = await session.call_tool(tool_name, args)
span.set_attribute("mcp.success", True)
return result
except Exception as e:
span.set_attribute("mcp.success", False)
span.record_exception(e)
raise
finally:
latency_ms = (time.perf_counter() - start) * 1000
span.set_attribute("mcp.latency_ms", latency_ms)
Troubleshooting¶
MCP server not discovered at startup
Checklist:
- Verify
MCP_ENABLED_SERVERSincludes your server name - Check
MCP_SERVER_{NAME}_URLis set correctly (case-sensitive) - Ensure the MCP server is running and reachable
- Check application logs for connection errors:
Tools show as unavailable in agent
Checklist:
- Verify server is healthy:
GET /api/v1/mcp/servers - Check tool names include the server prefix (e.g.,
cardapi_lookup_code) - Ensure the tool is listed in the agent's
toolsarray - Verify no naming conflicts with native tools
High latency on MCP calls
Optimization steps:
- Check network path between agent and MCP server
- Colocate services (same region/VNet reduces ~50ms)
- Implement caching for read-heavy patterns
- Profile server-side execution time
- Consider switching from HTTP to SSE transport
Authentication errors (401/403)
Debug steps:
- Verify token hasn't expired
- Check audience/scope configuration matches server expectations
- For OAuth: Ensure
redirect_uriexactly matches registered callback - Inspect server logs for specific auth failure reason
- Test with cURL to isolate client vs server issues
Related Documentation¶
-
Tool Development
Learn how to create native tools for the registry.
-
Agent Configuration
Configure agents to use native and MCP tools.
-
API Reference
Complete MCP API endpoint documentation.
-
Monitoring
Set up observability for MCP tool calls.