MCP Server Architecture for Platform Teams — Giving AI Live Access to Your Infrastructure
DEV Community Grade 8 3h ago

MCP Server Architecture for Platform Teams — Giving AI Live Access to Your Infrastructure

Pipeline & Prompts | Byte size guides on DevOps, Cloud and AI AI in the Stack #3 ⚡ Byte Size Summary MCP (Model Context Protocol) is the standard that lets AI agents interact with external systems — your cluster, your observability stack, your ticketing system — without bespoke integration code for every tool. MCP directly addresses AI hallucination and 2AM incident response by grounding AI answers in live system state. It does not solve tribal knowledge alone — that needs RAG alongside it. This article covers the production-grade architecture: what MCP servers are, how to design them for platform engineering use cases, and what you need to get right before running them anywhere near production. In logistics, the hardest problems rarely come from missing data. They come from disconnected systems. The warehouse knows one thing. The transportation management system knows another. Inventory systems lag behind reality by hours. Operators work around the gaps manually — copying numbers between screens, making calls to confirm what the system should already know, carrying context in their heads because no single system has the full picture. I spent years watching intelligent people solve problems that should not have existed, because the systems around them were designed to optimise locally rather than coordinate globally. The data was there. The capability was there. The coordination layer was not. Modern infrastructure operations feel surprisingly similar. Your Kubernetes cluster knows the state of every pod. Your observability stack knows the error rates and latency trends. Your ticketing system knows what changes were deployed in the last 24 hours. Your CI/CD pipeline knows what is currently in flight. And your AI assistant — the tool you are increasingly asking to help you reason about incidents — knows none of it, unless you paste it in manually. Model Context Protocol is the coordination layer that changes this. Not by giving AI access to everything at once, but by giving it a structured, auditable, controlled way to request the context it needs, from the systems that have it, at the moment it needs it. That is what this article is about. What MCP Actually Is Model Context Protocol (MCP) is an open standard, introduced by Anthropic, that defines how AI models communicate with external tools and data sources. Think of it as a common language that sits between an AI assistant and the systems it needs to interact with. Before MCP, every AI integration was bespoke. You wanted your LLM to query your Kubernetes cluster? Write a custom function. You wanted it to check PagerDuty? Write another one. You wanted it to search your runbooks and open a Jira ticket? Three separate integrations, all maintained independently, all breaking in different ways when APIs change. MCP replaces that with a standard. An MCP server exposes a set of tools — defined capabilities the AI can invoke — plus resources — data it can read. The AI client (Claude, Cursor, any MCP-compatible host) discovers what tools are available, decides which to call based on the user's question, calls them, and incorporates the results into its response. The AI does not have direct access to your systems. It has access to an MCP server that mediates that access. That distinction matters enormously for security and governance — which is why this article spends as much time on architecture as on implementation. Why Platform Engineers Should Care The RAG pipeline from Article 02 was useful for static knowledge — runbooks, documentation, past incident reports. MCP is useful for live state. When an engineer asks "what is causing the latency spike in the payments service right now?" — that is not a runbook question. It requires current pod status, recent deployment events, live error rates, and possibly the last three alerts that fired. None of that lives in a document. All of it lives in systems your MCP server can reach. The distinction between what MCP solves and what it does not matters before you design anything. AI hallucination — yes, directly. Hallucination happens when an LLM answers from training data instead of ground truth. MCP forces the AI to retrieve live, authoritative state before responding. It does not eliminate hallucination entirely — an LLM can still misinterpret what it retrieves — but it directly attacks the root cause for infrastructure questions. 2AM incidents — yes, directly. This is the primary operational use case. Instead of an engineer manually checking five systems in sequence while half-asleep, an AI with MCP access can pull pod status, recent events, and active alerts in a single query and reason across all of it simultaneously. Speed and context at the moment they are hardest to find. Too many dashboards — partially. MCP does not reduce the number of dashboards in your environment. It gives an AI a way to query across the systems those dashboards represent, so an engineer asks one question instead of navigating five screens. Th

Pipeline & Prompts | Byte size guides on DevOps, Cloud and AI AI in the Stack #3 ⚡ Byte Size Summary - MCP (Model Context Protocol) is the standard that lets AI agents interact with external systems — your cluster, your observability stack, your ticketing system — without bespoke integration code for every tool. - MCP directly addresses AI hallucination and 2AM incident response by grounding AI answers in live system state. It does not solve tribal knowledge alone — that needs RAG alongside it. - This article covers the production-grade architecture: what MCP servers are, how to design them for platform engineering use cases, and what you need to get right before running them anywhere near production. In logistics, the hardest problems rarely come from missing data. They come from disconnected systems. The warehouse knows one thing. The transportation management system knows another. Inventory systems lag behind reality by hours. Operators work around the gaps manually — copying numbers between screens, making calls to confirm what the system should already know, carrying context in their heads because no single system has the full picture. I spent years watching intelligent people solve problems that should not have existed, because the systems around them were designed to optimise locally rather than coordinate globally. The data was there. The capability was there. The coordination layer was not. Modern infrastructure operations feel surprisingly similar. Your Kubernetes cluster knows the state of every pod. Your observability stack knows the error rates and latency trends. Your ticketing system knows what changes were deployed in the last 24 hours. Your CI/CD pipeline knows what is currently in flight. And your AI assistant — the tool you are increasingly asking to help you reason about incidents — knows none of it, unless you paste it in manually. Model Context Protocol is the coordination layer that changes this. Not by giving AI access to everything at once, but by giving it a structured, auditable, controlled way to request the context it needs, from the systems that have it, at the moment it needs it. That is what this article is about. What MCP Actually Is Model Context Protocol (MCP) is an open standard, introduced by Anthropic, that defines how AI models communicate with external tools and data sources. Think of it as a common language that sits between an AI assistant and the systems it needs to interact with. Before MCP, every AI integration was bespoke. You wanted your LLM to query your Kubernetes cluster? Write a custom function. You wanted it to check PagerDuty? Write another one. You wanted it to search your runbooks and open a Jira ticket? Three separate integrations, all maintained independently, all breaking in different ways when APIs change. MCP replaces that with a standard. An MCP server exposes a set of tools — defined capabilities the AI can invoke — plus resources — data it can read. The AI client (Claude, Cursor, any MCP-compatible host) discovers what tools are available, decides which to call based on the user's question, calls them, and incorporates the results into its response. The AI does not have direct access to your systems. It has access to an MCP server that mediates that access. That distinction matters enormously for security and governance — which is why this article spends as much time on architecture as on implementation. Why Platform Engineers Should Care The RAG pipeline from Article 02 was useful for static knowledge — runbooks, documentation, past incident reports. MCP is useful for live state. When an engineer asks "what is causing the latency spike in the payments service right now?" — that is not a runbook question. It requires current pod status, recent deployment events, live error rates, and possibly the last three alerts that fired. None of that lives in a document. All of it lives in systems your MCP server can reach. The distinction between what MCP solves and what it does not matters before you design anything. AI hallucination — yes, directly. Hallucination happens when an LLM answers from training data instead of ground truth. MCP forces the AI to retrieve live, authoritative state before responding. It does not eliminate hallucination entirely — an LLM can still misinterpret what it retrieves — but it directly attacks the root cause for infrastructure questions. 2AM incidents — yes, directly. This is the primary operational use case. Instead of an engineer manually checking five systems in sequence while half-asleep, an AI with MCP access can pull pod status, recent events, and active alerts in a single query and reason across all of it simultaneously. Speed and context at the moment they are hardest to find. Too many dashboards — partially. MCP does not reduce the number of dashboards in your environment. It gives an AI a way to query across the systems those dashboards represent, so an engineer asks one question instead of navigating five screens. The dashboards still exist. You stop having to drive them manually during an incident. Tribal knowledge — not alone. MCP surfaces what your systems know. It does not surface what your team knows — the undocumented context that lives in people's heads, the runbook that exists nowhere in any system, the reason a service is named what it is. That is a RAG problem. The combination of RAG (for historical and human knowledge) and MCP (for live system state) is where the tribal knowledge gap actually starts to close. Neither alone is sufficient. An AI that can read your runbooks and query your cluster simultaneously is a meaningful operational tool. An AI that can only do one of those things is a limited one. MCP Server Architecture for Platform Engineering A production-grade MCP server for a platform team has four layers: Every tool invocation travels this path: the AI client sends a request, the Auth Gateway validates identity before anything reaches your infrastructure, the MCP server processes it through governance and audit controls, and the Kubernetes API Server enforces access policy independently of the application layer. Two enforcement gates — not one. That is the architecture the implementation sections below are built around. The four layers in code: Layer 1 — Governance First Before writing a single tool definition, decide and enforce these three things: Read-only by default. Every tool that touches production infrastructure should be read-only unless you have explicitly designed the write path with human approval steps. An MCP server that can kubectl delete anything is an incident waiting to happen. Start with read, earn trust, expand deliberately. Audit logging. Every tool call should be logged with: timestamp, tool name, input parameters, calling session identity, and response status. This is your audit trail when something goes wrong. It is also how you demonstrate to your security team that AI is not a black box. Rate limiting. An AI in an agentic loop can call tools hundreds of times in seconds. Without rate limiting, a runaway agent can exhaust your Kubernetes API quota, spam your ticketing system, or trigger alert storms in your observability stack. Set per-session and per-tool limits before you deploy. Layer 2 — Backend Clients The MCP server needs clients for each system it connects to. Keep these thin — their job is to call APIs and return structured data, not to contain business logic. For a Kubernetes-connected MCP server, using the official kubernetes Python client: # k8s_client.py from kubernetes import client, config from typing import Optional class KubernetesClient: def __init__(self, in_cluster: bool = False): if in_cluster: config.load_incluster_config() else: config.load_kube_config() self.v1 = client.CoreV1Api() self.apps_v1 = client.AppsV1Api() def get_pod_status(self, namespace: str, pod_name: str) -> dict: pod = self.v1.read_namespaced_pod(name=pod_name, namespace=namespace) return { "name": pod.metadata.name, "namespace": pod.metadata.namespace, "phase": pod.status.phase, "conditions": [ {"type": c.type, "status": c.status, "reason": c.reason} for c in (pod.status.conditions or []) ], "container_statuses": [ { "name": cs.name, "ready": cs.ready, "restart_count": cs.restart_count, "state": str(cs.state) } for cs in (pod.status.container_statuses or []) ] } def list_failing_pods(self, namespace: Optional[str] = None) -> list[dict]: if namespace: pods = self.v1.list_namespaced_pod(namespace=namespace) else: pods = self.v1.list_pod_for_all_namespaces() failing = [] for pod in pods.items: if pod.status.phase not in ("Running", "Succeeded"): failing.append({ "name": pod.metadata.name, "namespace": pod.metadata.namespace, "phase": pod.status.phase, "reason": pod.status.reason }) return failing def get_recent_events(self, namespace: str, limit: int = 20) -> list[dict]: events = self.v1.list_namespaced_event( namespace=namespace, limit=limit ) return [ { "type": e.type, "reason": e.reason, "message": e.message, "involved_object": e.involved_object.name, "count": e.count, "last_timestamp": str(e.last_timestamp) } for e in sorted( events.items, key=lambda x: x.last_timestamp or "", reverse=True ) ] Layer 3 — Tool Definitions This is the layer the AI interacts with directly. Tool descriptions are not just documentation — they are what the LLM reads to decide whether to call the tool and how to format its inputs. Write them precisely. # tools.py from mcp.server import Server from mcp.types import Tool, TextContent import json import logging from k8s_client import KubernetesClient from audit import log_tool_call logger = logging.getLogger(__name__) k8s = KubernetesClient(in_cluster=False) # Set True when running inside the cluster def register_tools(server: Server): @server.list_tools() async def list_tools(): return [ Tool( name="get_pod_status", description=( "Get the current status of a specific Kubernetes pod, including phase, " "readiness conditions, container states, and restart

Comments

No comments yet. Start the discussion.