From Idea to Infrastructure: Building a DevOps Copilot with AI Agents

A Journey into Multi-Agent Systems with LangGraph and AWS MCP Server

Jul 23, 2025

Introduction

In modern software delivery, DevOps tasks—from high‑level architecture design to infrastructure provisioning—often involve repetitive, error‑prone steps. What if you could hand off that entire workflow to a squad of AI agents? In this post, we explore the DevOps Multi‑Agent Copilot, a system that transforms a simple user request into a validated, production‑ready Terraform project—all in under a minute.

You’ll learn how we leverage LangChain, LangGraph, FastAPI, MCP and Streamlit, along with a network of specialized agents, to create an interactive DevOps assistant that:

Understands your high‑level vision
Plans the right AWS services and components
Visualizes the architecture
Generates and validates infrastructure‑as‑code

The Big Picture: System Architecture

Figure 1. Multi‑Agent DevOps Copilot Architecture

The system comprises four layers:

Agentic Core (LangGraph)
- Defined in graph.py
- Stateful, cyclical graph where AI agents collaborate
- Memory managed via MemorySaver for long‑running, context‑aware workflows
Backend (FastAPI)
- Exposes a /stream endpoint (server.py)
- Orchestrates graph execution, streams events to the UI
MCP Servers (Toolbelt Layer)
- Each specialized tool (diagram generation, Terraform validation, etc.) runs on its own MCP server
- Exposes HTTP/SSE APIs so agents can call tools remotely
- Decouples compute‑heavy or stateful operations from the main application
Frontend (Streamlit)
- Interactive chat UI (app.py)
- Streams agent “thinking” steps, renders diagrams, and shows code in real time

Meet the Team of AI Agents

Each agent is a ReAct specialist created via create_react_agent, with its own system prompt and toolset hosted on a remote MCP server.

1. The Supervisor: Project Manager & Orchestrator

The supervisor_agent is the heart of the graph. Its job isn't to do the work but to delegate it. After each step, the conversation returns to the supervisor, which analyzes the current state and decides which agent to route the task to next. This routing logic is defined in its system prompt and allows it to orchestrate the entire workflow, from planning to FINISH.

2. The Planning Agent: Requirements Clarifier & Service Recommender

A Solution Architect. You give it a vague idea, and it transforms it into a robust, detailed technical plan. It thinks about best practices, suggests modern AWS services, and designs a blueprint for the other agents to follow.

3. The Diagram Agent: Visual Architect

This agent takes the planner's blueprint and instantly renders it as a professional architecture diagram. It provides immediate visual feedback, ensuring the design is what you envisioned before a single line of infrastructure code is written.

4. The Terraform Agent: IaC Generator & Validator

A tireless Terraform expert. It takes the final architecture and writes a complete, multi-file Terraform project. More importantly, it doesn't just write code—it writes, tests, and corrects it in a loop until it passes validation, ensuring the output is ready for deployment.

End‑to‑End Workflow

User Requests

“Create a diagram for a serverless Python app on AWS with a database.”

Supervisor
- Parses intent
- Delegates to Planning Agent
Planning Agent
- Clarifies requirements
- Outputs: “API Gateway → Lambda → DynamoDB”
Supervisor → Diagram Agent
- generate_diagram tool produces image + raw Python code
Supervisor → Terraform Agent
- Translates diagram code to Terraform HCL
- Writes to disk, runs terraform_validate (up to 3 retries)
Supervisor
- On success, emits FINISH → user gets codebase + diagram

Result: From idea to validated Terraform code and architecture diagram in under a minute.

Under the Hood

1. LangGraph as the Brain

Graph model of nodes (agents) & edges (state transfers)
Ensures robust, stateful orchestration for multi‑step tasks

2. ReAct Agents as the Experts

Reason & Act paradigm: each agent chooses tools, observes results, iterates
Prebuilt via create_react_agent for consistent behavior

3. MCP Servers as the Tool‑belt

Hosted microservices exposing tool APIs over HTTP
Decouples agents from underlying tool implementations
Scalability and security enhancements over stdio‑based protocols

Modernizing the MCP Protocol

We adapted the original AWS Labs MCP (Model Context Protocol) servers:

From stdio → HTTP/SSE for network‑native tool access
Dependency upgrades for better performance and security
Streamable responses to support real‑time UIs

This refactor allows seamless, scalable interactions between agents and their specialized tools.

Conclusion & Next Steps

By combining LangGraph orchestration with a squad of ReAct agents, we’ve built a DevOps Copilot that:

Automates complex workflows end‑to‑end
Remains highly extensible: add new agents (e.g., security scanner, cost estimator) by defining new graph nodes
Powers a conversational interface that feels like chatting with your DevOps team

Ready to try it?
🔗 Explore the GitHub repo → DevOps‑agent

Autonomous AI Architect

Discussion about this post