Building a Multi-Agent A2A Architecture on Snowflake and Microsoft Fabric - Without Replacing Either
Building a Multi-Agent A2A Architecture on Snowflake and Microsoft Fabric - Without Replacing Either
Every enterprise healthcare payer I work with has the same problem. They have years of investment in Snowflake - semantic models, claims analytics, carefully curated data products. They have Microsoft Fabric rolling out across their organization - lakehouses, Delta tables, real-time intelligence. They have Azure OpenAI licenses and ambitious AI roadmaps. And theyβre asking the same question: how do we put AI on top of all of this without ripping anything out?
This is the story of how I built HealthIQ - a unified healthcare intelligence platform that answers that question with a working, production-grade architecture.
The Problem With "AI on Your Data"
Most "AI on your data" demos show a chatbot connected to a single database. Ask it a question, it writes SQL, returns an answer. Clean. Simple. And completely inadequate for enterprise healthcare.
Real healthcare analytics doesn't live in one place. Claims financials live in Snowflake. Bed occupancy and staffing data live in Fabric lakehouses. Clinical policy documents live in document stores. Escalation workflows live in Logic Apps. Getting a complete picture of operational health requires crossing all of these - in a single, coherent answer.
The naive solution is to build one giant agent with every tool attached. I've seen this fail. At scale, a single agent juggling ten tools loses coherence. Routing degrades. Context windows fill up. The model gets confused about which tool to call when. There's a better pattern.
The Architecture: Five Tiers, Two Specialists, One Orchestrator
HealthIQ is built on a five-tier architecture with a multi-agent A2A (Agent-to-Agent) orchestration layer on top.
Tier 1 - Structured Analytics: Snowflake Cortex Agent
The foundation is a Snowflake semantic view (CLAIMS_SEMANTIC) that exposes curated claims metrics - total paid amounts, PMPM cost, denial rates by specialty, DRG utilization - as named business concepts rather than raw tables. On top of that semantic view sits a Snowflake Cortex Agent, exposed as a managed MCP (Model Context Protocol) server. This means any AI orchestrator can call it with natural language and get back structured analytics - without writing a line of SQL.
- Q: What were total claims paid in Q1 2026?
- A: $370,685,724.65
Tier 2 - Operational Intelligence: Microsoft Fabric Data Agent
Four Delta tables in a Fabric Lakehouse capture hospital operational data - bed occupancy, staffing ratios, patient volume, authorization status. A Fabric Data Agent (HospitalOpsAgentV2) sits on top of these tables and answers operational questions in natural language.
- Q: Which facility has the highest occupancy?
- A: Riverside Health (Southeast) at 92%
Tier 3 - Document Intelligence: Azure AI Search RAG
A RAG layer built on Azure AI Search indexes a curated set of clinical policy documents, CMS benchmark reports, and medical director summaries - included here to showcase the pattern. In practice, this layer can connect to your existing document stores: SharePoint libraries, Azure Blob Storage, OneLake files, or any indexed enterprise content. The point isn't the five documents I loaded - it's that your existing knowledge assets become part of the reasoning chain without any restructuring. When the AI needs to explain why a metric looks the way it does - not just what the number is - it pulls context from this layer.
Tier 4 - Orchestration: Azure AI Foundry
This is where the intelligence lives. Two specialist agents coordinate through an A2A protocol:
- ClaimsIntelligenceAgent - owns everything financial. It has access to the Snowflake MCP tool and the Policy Search RAG layer. It knows claims analytics the way a Senior Actuary knows claims analytics.
- HospitalOpsAgent - owns everything operational. It has access to the Fabric Data Agent and an escalation workflow. It knows clinical capacity the way a COO knows clinical capacity.
Above them sits HealthcareOrchestratorV3 - a master agent that does no data access directly. Its only job is to understand the question, route to the right specialist(s), and synthesize their responses into a single executive answer.
I chose GPT-4.1 for both the specialists and the synthesis step - the routing and cross-domain reasoning needed reliable structured output more than raw speed, and GPT-4.1 held up consistently across multi-turn agent calls without drifting off format.
Tier 5 - Action: Azure API Management + Logic App
This is where AI stops being a reporting layer and becomes an operational system. The action tier is unlimited in scope - constrained only by your business case, not the technology. Trigger a care management workflow. Update a claims record. Open a ServiceNow ticket. Push a notification to a clinical team. Invoke an RPA bot. Any system reachable via API becomes an action the AI can take.
For this showcase, I kept it simple: when occupancy hits 92% and the user asks to escalate, the orchestrator delegates to HospitalOpsAgent, which calls an APIM-proxied Logic App and delivers an escalation email to the care management team. The full loop - question to action - closes in one conversation. The email is illustrative. The pattern is production-grade.
The A2A Pattern: Why It Matters
The key architectural insight is the separation between specialists and orchestrators. Each specialist agent is small, focused, and owns its own tools. ClaimsIntelligenceAgent doesn't know anything about bed occupancy. HospitalOpsAgent doesn't know anything about PMPM cost. They're experts in one domain.
The master orchestrator doesn't touch data directly. It reasons about the question and delegates. For a cross-domain question like "which hospitals have the highest occupancy and how does that correlate with our denial rates?" - it calls both specialists in parallel, waits for both responses, and synthesizes them into one answer.
This is how enterprise AI scales. Not one monolithic agent that knows everything, but a network of specialists coordinated by an orchestrator. As your data estate grows, you add specialist agents - one per domain, owned by the team that knows that data best. The orchestrator barely changes.
HealthcareOrchestratorV3
βββ ClaimsIntelligenceAgent β Snowflake + Policy RAG
βββ HospitalOpsAgent β Fabric Lakehouse + Escalation
What This Enables for Healthcare Payers
For a managed care organization, this architecture answers the questions that actually matter:
- "What's driving our Cardiology denial rate, and how does it compare to CMS national benchmarks?" - Claims agent pulls the rate, RAG layer pulls the benchmark and the medical director's context.
- "Which facilities are at critical capacity right now?" - Ops agent returns live occupancy across all facilities, flags anything above 90%.
- "Send an escalation alert for Riverside Health" - Ops agent calls the escalation workflow, email goes to the care management team, confirmation comes back in the chat.
All of this in a single Teams conversation. No SQL. No dashboard hunting. No switching between systems.
The "Aha" Moment for Enterprise AI
The insight that makes this architecture resonate with enterprise clients is simple: You don't need to centralize your data to centralize your intelligence. Snowflake stays in Snowflake. Fabric stays in Fabric. Each system keeps its own governance, its own semantic layer, its own access controls. The AI orchestration layer sits on top and coordinates - it doesn't absorb. This is how you sell AI to an organization that has spent years building a data estate and isn't going to blow it up for a chatbot.
The Production Detail That Matters: APIM as a Proxy
Here's the real-world detail that demos never show you. Azure AI Foundry's OpenAPI tool redacts query parameters before making HTTP calls. If your API uses query string authentication - SAS tokens, API version parameters - they get stripped to ?REDACTED before the call goes out.
The fix: put Azure API Management in front. Foundry sends a clean call with just an Ocp-Apim-Subscription-Key header. An APIM inbound policy uses set-backend-service and rewrite-uri to reconstruct the full URL with all parameters before forwarding. Foundry never sees the sensitive parameters. The API works correctly. This is the kind of pattern that separates a demo from a production architecture.
There's a second reason APIM belongs in this architecture, and it matters more than the redaction workaround: identity and access control. No enterprise lets AI agents call backend systems anonymously, and an API gateway is exactly where you enforce that. APIM ties into your identity provider - Entra ID, OAuth, whatever your enterprise standardizes on - so every call from an agent carries a verifiable identity, not just a static key. You get centralized logging of who (or what agent) accessed which system and when, rate limiting per caller, and the ability to revoke access without touching the agent itself. For a healthcare payer handling PHI-adjacent operational and claims data, that audit trail isn't optional - it's the control that makes the rest of this architecture deployable in production.
Guardrails and Observability: The Parts That Make This Trustworthy
A working demo and a deployable system are different things. Two Foundry capabilities close that gap.
Guardrails. Azure AI Foundry lets you attach guardrails directly to each agent - content filtering, jailbreak detection, and groundedness checks that run before a response ever reaches the user. In a healthcare context this matters concretely: you don't want an agent confidently answering a clinical policy question from a hallucinated detail, and you don't want a claims agent exposed to prompt injection through a malformed query. Guardrails sit at the agent level, so each specialist enforces its own policy independent of how the orchestrator routes to it.
Tracing. Every agent call in Foundry - including the A2A hops between the orchestrator and each specialist - generates a trace. When a cross-domain question comes back wrong, or a tool call fails, the trace shows exactly which agent was invoked, what arguments it passed, what the tool returned, and how long each step took. This is the difference between debugging a black box and debugging a system. I used traces directly to diagnose a token-fetch failure in same-project A2A calls - without that visibility, it would have been a guessing exercise.
For any architecture handling claims or clinical data, guardrails and tracing aren't optional extras. They're what a security or compliance review will ask about first.
What I'd Harden Before Calling This Production-Ready
It's worth being honest about what a demo glosses over.
Error handling. Specialist agents occasionally hit transient failures - an MCP server timeout, a token refresh delay. Right now the orchestrator surfaces the error in its synthesis rather than retrying silently, which is the right behavior for a demo but needs a proper retry-with-backoff policy in production, plus a fallback message that doesn't expose internal error text to the end user.
Cost and latency. Every specialist agent call has its own cost and latency profile. A cross-domain question that fans out to two agents in parallel costs roughly double a single-domain question and adds the synthesis call on top. At scale, the routing logic should account for this - not every question needs both specialists, and the keyword-based router I built for this showcase is a placeholder for a more deliberate semantic routing decision in GPT-4.1 itself.
Same-project A2A auth. As of this writing, Foundry's native A2A wiring for same-project agents has rough edges around managed identity token exchange. I worked around it with a Python orchestration layer calling each agent's A2A endpoint directly - which works well, but native in-Foundry publishing to Teams is the cleaner long-term path once that matures.
None of these are dealbreakers. They're the normal gap between "I proved the pattern works" and "this is hardened for production traffic" - and naming that gap honestly is part of doing this work seriously.
Tradeoffs I Considered and Rejected
Every architectural choice here had a simpler alternative I deliberately didn't take. Naming them - and why - matters more than the choice itself.
A2A specialists vs. one agent with many tools. The simpler path is a single agent with every tool attached: Snowflake MCP, Fabric Data Agent, RAG, escalation, all in one system prompt. I rejected this after watching it degrade in practice. Past roughly five or six tools, a single agent starts misrouting - calling the wrong tool, or calling the right tool with the wrong framing because it's reasoning about ten things at once instead of one. The single-agent model also has an organizational failure mode that matters more long-term than the technical one: it has no clean ownership boundary. If the claims team and the ops team are both editing the same agent's system prompt and tool list, you get merge conflicts in intent, not just in code. A2A specialists cost you orchestration overhead and an extra synthesis call. What you get back is a system where each team owns a contained blast radius, and a routing failure in one specialist doesn't take down the other. For two specialists, the overhead is a fair trade. For ten, it's close to mandatory.
API gateway vs. native Foundry connectors. Foundry's native OpenAPI and MCP tool support is the path of least resistance, and I started there. I moved to an APIM-fronted pattern for two reasons that aren't visible in a demo. First, the practical one: Foundry's OpenAPI tool redacts query parameters, which silently breaks any backend using query-string auth - there's no native workaround inside Foundry itself. Second, the more important one: native connectors authenticate with a static key the agent holds directly. That's acceptable for a prototype and not acceptable for a system touching PHI-adjacent data.
Comments
No comments yet. Start the discussion.