When should I use this architecture vs. a simpler single-agent?

Use single-agent when: queries are domain-specific, tools are few, responses don't need cross-source synthesis. Move to multi-agent when: queries span domains, require different expertise, or benefit from parallel investigation.

How do I debug issues across multiple agents?

Distributed tracing is essential. Every request needs a trace ID that follows it through all agents. Log agent decisions, tool calls, and intermediate results. Build a trace viewer showing the full execution tree.

What's the latency overhead of A2A communication?

Direct A2A adds 10-50ms per handoff for local agents. The bigger cost is LLM calls—each agent invocation means another inference. Optimize by parallelizing independent agents and using smaller models for routing.

How do I handle conflicting information from different agents?

Build conflict resolution into synthesis. Options: prefer authoritative sources, prefer recent data, use confidence scores, or flag conflicts for user review. Never silently choose one version.

How do I measure if my knowledge agent is working?

Key metrics: task completion rate, retrieval precision, citation coverage, user satisfaction, and cost per successful query. Build evaluation datasets from real user queries.

What's the minimum team size?

Phase 1-2: 1-2 engineers. Phase 3-4: 3-4 engineers with different specializations. Ongoing maintenance needs at least 1 dedicated engineer for monitoring and iteration.

Building a Knowledge Agent Platform: Multi-Agent Architecture with RAG, MCP, and Orchestration | Enrico Piovano

Introduction

Building a truly capable AI agent requires more than a good prompt and a few tools. Production systems like Claude Code, Cursor, and Perplexity demonstrate what's possible when you combine multiple specialized components: intelligent retrieval, standardized tool protocols, multi-agent coordination, and sophisticated reasoning.

2025: The year of enterprise agentic architecture: Gartner predicts that 40% of enterprise applications will include integrated task-specific agents by 2026, up from less than 5% today. According to Bain & Company's 2025 Technology Report, 5-10% of technology spending over the next 3-5 years will be directed toward building foundational agent capabilities.

The knowledge layer is foundational: As InfoQ's architectural analysis notes, "without a knowledge layer, the agent is just doing things based on gut feeling. With it, the agent gains boundaries—what's true, what's allowed, what's relevant, what's outdated, what's sensitive." This is why knowledge agent platforms require such careful architecture.

Major platforms emerging in 2025: Microsoft Foundry combines a large model catalog, agent runtime, retrieval layer, and control plane into an enterprise agent platform. ServiceNow's AI Platform unifies intelligence, data, and orchestration with Knowledge Graphs and AI Agent Fabric. Meanwhile, Google's Agent2Agent (A2A) protocol enables agents to discover and collaborate across platforms.

This guide presents a complete architecture for building knowledge agent platforms—systems that can research, reason, execute, and learn across complex multi-step tasks. We'll use a practical example throughout: an Enterprise Knowledge Agent that helps organizations answer complex questions by searching internal documents, querying databases, browsing the web, and synthesizing findings.

What makes this different from simpler agent architectures?

Simple Agent	Knowledge Agent Platform
Single LLM with tools	Multiple specialized agents
Basic RAG retrieval	Multi-strategy agentic RAG
Hardcoded tools	Dynamic MCP tool discovery
Isolated execution	A2A agent communication
Stateless	Persistent memory and context
Single reasoning pass	Iterative reasoning with reflection

Prerequisites: This post assumes familiarity with basic agent concepts. For foundations, see Building Agentic AI Systems and Building Production-Ready RAG Systems.

The Enterprise Knowledge Agent: Our Reference System

Throughout this guide, we'll build an Enterprise Knowledge Agent with these capabilities:

User query: "What were the key factors in our Q3 revenue decline, and how do they compare to competitor performance?"

What the system does:

Understands the query requires financial data, internal documents, and competitive intelligence
Plans a multi-step research approach
Retrieves from internal wikis, financial databases, CRM notes, and web sources
Delegates sub-tasks to specialized agents (financial analyst, competitive researcher)
Synthesizes findings into a coherent analysis
Cites all sources with confidence levels

This isn't a simple RAG query—it requires orchestrating multiple data sources, agents, and reasoning steps.

Part 1: System Architecture Overview

The Complete Stack

A knowledge agent platform has six distinct layers, each with specific responsibilities:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         USER INTERFACE LAYER                             │
│                    (Chat, API, Scheduled Tasks)                          │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         GATEWAY & SAFETY LAYER                           │
│              Input Validation │ Rate Limiting │ Guardrails               │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATION LAYER                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │   Query     │  │   Agent     │  │  Execution  │  │   Result    │    │
│  │  Analyzer   │→ │   Router    │→ │   Engine    │→ │ Synthesizer │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          AGENT LAYER                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │  Research   │  │  Analyst    │  │  Executor   │  │   Critic    │    │
│  │   Agent     │  │   Agent     │  │   Agent     │  │   Agent     │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
│                         ↑    A2A Protocol    ↑                          │
│                         └────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        CAPABILITY LAYER                                  │
│  ┌─────────────────────┐  ┌─────────────────────┐                       │
│  │   Multi-Search RAG  │  │    MCP Tool Layer   │                       │
│  │  ┌───┐ ┌───┐ ┌───┐  │  │  ┌───┐ ┌───┐ ┌───┐ │                       │
│  │  │Vec│ │Key│ │Web│  │  │  │DB │ │API│ │File│ │                       │
│  │  └───┘ └───┘ └───┘  │  │  └───┘ └───┘ └───┘ │                       │
│  └─────────────────────┘  └─────────────────────┘                       │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         MEMORY & STATE LAYER                             │
│     Working Memory │ Session State │ Long-term Memory │ Audit Log        │
└─────────────────────────────────────────────────────────────────────────┘

Layer Responsibilities

Each layer has clear boundaries and interfaces:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                        LAYER RESPONSIBILITIES                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  INTERFACE LAYER                                                         │
│  ├── Accept user input (text, files, structured queries)                 │
│  ├── Stream responses back to user                                       │
│  ├── Handle authentication and session management                        │
│  └── Support multiple channels (chat, API, webhooks)                     │
│                                                                          │
│  GATEWAY LAYER                                                           │
│  ├── Validate input format and size limits                               │
│  ├── Apply rate limiting per user/organization                           │
│  ├── Run guardrails (relevance, safety, jailbreak detection)             │
│  └── Log all requests for audit                                          │
│                                                                          │
│  ORCHESTRATION LAYER                                                     │
│  ├── Analyze query complexity and requirements                           │
│  ├── Route to appropriate agents                                         │
│  ├── Manage parallel vs. sequential execution                            │
│  └── Synthesize final response from agent outputs                        │
│                                                                          │
│  AGENT LAYER                                                             │
│  ├── Execute domain-specific tasks                                       │
│  ├── Use tools via MCP                                                   │
│  ├── Communicate with other agents via A2A                               │
│  └── Apply reasoning strategies (CoT, reflection)                        │
│                                                                          │
│  CAPABILITY LAYER                                                        │
│  ├── Multi-strategy retrieval (vector, keyword, graph)                   │
│  ├── Tool execution via MCP servers                                      │
│  └── External API integrations                                           │
│                                                                          │
│  MEMORY LAYER                                                            │
│  ├── Working memory (current task state)                                 │
│  ├── Session memory (conversation history)                               │
│  ├── Long-term memory (user preferences, learned facts)                  │
│  └── Audit log (all actions for compliance)                              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Complete Request Flow

When a user asks our Enterprise Knowledge Agent a question, here's the complete flow:

Code

User Query: "What caused Q3 revenue decline vs competitors?"
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 1. GATEWAY                                                │
│    • Input validation: ✓ valid text                       │
│    • Rate limit check: ✓ under quota                      │
│    • Guardrail check: ✓ business-relevant query           │
│    • User context loaded: Enterprise tier, Finance role   │
└──────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 2. QUERY ANALYSIS                                         │
│    • Complexity: HIGH (multi-source, comparative)         │
│    • Intent: Analysis + Comparison                        │
│    • Required sources: Internal finance, CRM, Web         │
│    • Decomposition: 3 sub-questions identified            │
│    • Estimated tokens: ~15,000                            │
│    • Estimated latency: 8-12 seconds                      │
└──────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 3. AGENT ROUTING                                          │
│    • Primary: Analyst Agent (synthesis role)              │
│    • Delegated: Research Agent (internal data)            │
│    • Delegated: Competitor Intel Agent (market data)      │
│    • Execution: Parallel (agents are independent)         │
└──────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Research Agent │ │ Research Agent │ │ Competitor     │
│ Sub-Q1: Q3     │ │ Sub-Q2: Root   │ │ Intel Agent    │
│ performance    │ │ cause factors  │ │                │
│                │ │                │ │                │
│ Tools:         │ │ Tools:         │ │ Tools:         │
│ • SQL Query    │ │ • CRM Search   │ │ • Web Search   │
│ • Doc Search   │ │ • Wiki Search  │ │ • News API     │
└────────────────┘ └────────────────┘ └────────────────┘
        │                   │                   │
        ▼                   ▼                   ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Findings:      │ │ Findings:      │ │ Findings:      │
│ Revenue: $4.2M │ │ Lost 3 deals   │ │ Competitor A   │
│ Down 15% QoQ   │ │ to Acme Corp   │ │ grew 8%        │
│ Margin: 42%    │ │ Delayed renew- │ │ Competitor B   │
│ [confidence:   │ │ als in EMEA    │ │ flat           │
│  HIGH]         │ │ [confidence:   │ │ [confidence:   │
│                │ │  MEDIUM]       │ │  MEDIUM]       │
└────────────────┘ └────────────────┘ └────────────────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 4. SYNTHESIS                                              │
│    Analyst Agent receives all findings:                   │
│    • Merge findings from 3 sources                        │
│    • Resolve: CRM data vs Finance DB (use Finance)        │
│    • Identify: Our 15% decline vs industry 3% growth      │
│    • Structure: Executive summary + details               │
│    • Citations: 7 sources with confidence levels          │
└──────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 5. QUALITY CHECK                                          │
│    Critic Agent reviews:                                  │
│    □ Answers original question? ✓                         │
│    □ All claims sourced? ✓ (7/7 cited)                    │
│    □ Logical gaps? ⚠ (EMEA detail sparse)                 │
│    □ Confidence calibrated? ✓                             │
│    → PASS with note: "EMEA analysis limited by CRM data"  │
└──────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 6. RESPONSE                                               │
│    Formatted response with:                               │
│    • Executive summary (2 paragraphs)                     │
│    • Key findings (bulleted)                              │
│    • Competitor comparison table                          │
│    • Sources and confidence levels                        │
│    • Caveat about EMEA data limitations                   │
└──────────────────────────────────────────────────────────┘

Part 2: Intelligent Query Understanding

The first challenge is understanding what the user actually needs. Simple keyword matching fails for complex queries—you need semantic understanding and task decomposition.

Why query understanding is the foundation of intelligent agents: The difference between a demo and a production system often comes down to query understanding. A demo can assume well-formed queries: "What is the capital of France?" Production systems face real queries: "so what happened with that thing Sarah mentioned last quarter about the revenue stuff?" This query requires resolving "that thing Sarah mentioned" (conversation context), "last quarter" (temporal reasoning), and "revenue stuff" (semantic understanding) before any retrieval can happen.

The cost of misunderstanding: When you misclassify a query, every downstream step is wrong. If the system thinks "Compare our pricing with competitors" is a simple lookup rather than a comparison requiring multiple data sources, it might return a single pricing document rather than the competitive analysis the user needs. Even perfect retrieval and generation can't fix a fundamental misunderstanding of intent.

Query Classification System

Every incoming query is analyzed across multiple dimensions. This isn't a single classifier—it's a multi-head analysis that extracts several orthogonal properties simultaneously:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         QUERY CLASSIFICATION                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Input: "What caused Q3 revenue decline vs competitors?"                 │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ DIMENSION           │ CLASSIFICATION        │ IMPACT              │  │
│  ├─────────────────────┼───────────────────────┼─────────────────────┤  │
│  │ Complexity          │ HIGH                  │ Multi-agent needed  │  │
│  │ Intent              │ ANALYSIS + COMPARE    │ Synthesis required  │  │
│  │ Temporal scope      │ Q3 2024 (specific)    │ Filter retrieval    │  │
│  │ Data domains        │ Finance, CRM, Market  │ Multiple sources    │  │
│  │ Output format       │ Report with citations │ Structured output   │  │
│  │ Confidence need     │ HIGH (business)       │ Add critic review   │  │
│  │ User role           │ Finance (permitted)   │ Full data access    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  Routing Decision: Multi-agent parallel with synthesis                   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Understanding each classification dimension:

Complexity determines resource allocation. A HIGH complexity query needs multiple agents working in parallel, extended time budgets, and possibly multiple retrieval rounds. A LOW complexity query (simple factual lookup) can be answered in a single retrieval pass with a single agent. Over-allocating resources to simple queries wastes compute and adds latency. Under-allocating to complex queries produces incomplete answers.

Intent shapes the output format and synthesis strategy. An ANALYSIS intent produces explanatory prose with supporting evidence. A COMPARE intent produces structured comparisons (tables, bullet points). Getting this wrong means returning a wall of text when the user wanted a simple comparison table, or vice versa.

Temporal scope is critical for enterprise data. "Q3 2024" must be translated into date filters (July 1 - September 30, 2024) applied to every data source. Without explicit temporal filtering, the system might return data from the wrong period, producing confidently wrong answers.

Data domains determine which retrieval systems and agents to invoke. Finance queries go to the SQL database and financial reports. CRM queries search support tickets and customer notes. Market queries need external web search. Multi-domain queries require coordination across all of these.

User role is often overlooked but essential for access control. A finance analyst can see detailed revenue breakdowns. A general employee might only see summary statistics. The classification system must know who's asking to return appropriate results.

Query Decomposition

Complex queries must be broken into answerable sub-questions. The decomposer identifies dependencies between sub-questions. This is one of the most important capabilities distinguishing simple RAG from agentic systems—the ability to plan multi-step research:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                       QUERY DECOMPOSITION                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Original: "What caused Q3 revenue decline vs competitors?"              │
│                                                                          │
│                         ┌─────────────────┐                              │
│                         │   DECOMPOSER    │                              │
│                         └────────┬────────┘                              │
│                                  │                                       │
│         ┌────────────────────────┼────────────────────────┐              │
│         ▼                        ▼                        ▼              │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐        │
│  │    SUB-Q1       │   │    SUB-Q2       │   │    SUB-Q3       │        │
│  │                 │   │                 │   │                 │        │
│  │ "What was our   │   │ "What specific  │   │ "How did key    │        │
│  │  Q3 revenue     │   │  factors drove  │   │  competitors    │        │
│  │  performance?"  │   │  the decline?"  │   │  perform in Q3?"│        │
│  │                 │   │                 │   │                 │        │
│  │ Sources:        │   │ Sources:        │   │ Sources:        │        │
│  │ • Finance DB    │   │ • CRM           │   │ • Web search    │        │
│  │ • Revenue table │   │ • Sales wiki    │   │ • News APIs     │        │
│  │                 │   │ • Support logs  │   │ • Analyst rpts  │        │
│  │                 │   │                 │   │                 │        │
│  │ Dependency:     │   │ Dependency:     │   │ Dependency:     │        │
│  │ None (start)    │   │ Needs Q1 result │   │ None (parallel) │        │
│  └─────────────────┘   └─────────────────┘   └─────────────────┘        │
│         │                        │                        │              │
│         │                        │                        │              │
│         └─────────── DEPENDENCY GRAPH ────────────────────┘              │
│                                                                          │
│  Execution Order:                                                        │
│  • Phase 1 (parallel): Q1 + Q3                                          │
│  • Phase 2 (sequential): Q2 (needs Q1 context)                          │
│  • Phase 3: Synthesis                                                    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

The dependency graph is crucial for efficiency: Notice that Q1 and Q3 have no dependencies—they can execute in parallel. Q2 depends on Q1's result because understanding "what specific factors drove the decline" requires first knowing what the decline looked like. Without dependency analysis, you'd either: (1) execute everything sequentially (slow), or (2) execute everything in parallel and miss critical context (incomplete).

Why sub-questions have explicit source assignments: Each sub-question specifies which data sources to query. This prevents the system from searching everywhere for everything (expensive and slow) and enables targeted retrieval. The source assignments come from learned patterns: financial performance questions → Finance DB; customer feedback → CRM; competitive intelligence → web search.

The execution plan trades off latency and accuracy: Phase 1 runs Q1 and Q3 in parallel (typically 2-3 seconds). Phase 2 runs Q2 with Q1's context (another 2-3 seconds). Phase 3 synthesizes everything (1-2 seconds). Total: ~6-8 seconds. Running everything sequentially would take 9-12 seconds. Running everything in parallel would produce an incomplete Q2 answer.

Decomposition Strategies by Query Type

Different query types require different decomposition approaches:

Query Type	Example	Decomposition Strategy
Comparative	"A vs B vs C"	Split into individual entity queries, then synthesize comparison
Temporal	"Trend over Q1-Q4"	Break into time periods, identify inflection points
Causal	"Why did X happen?"	Establish facts first (what), then investigate causes (why)
Multi-domain	"Technical and business impact"	Separate by domain expertise, merge with domain expert
Conditional	"If X then what?"	Establish baseline, model scenarios
Aggregation	"Summary of all projects"	Parallel retrieval, progressive summarization

Part 3: Multi-Strategy RAG Layer

Simple RAG (embed query → find similar chunks → return) fails for complex questions. Our architecture uses agentic RAG with multiple retrieval strategies, intelligent routing, and self-evaluation.

Search Strategy Router

Different queries need different retrieval approaches. The router analyzes the query and selects the optimal strategy:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      SEARCH STRATEGY ROUTER                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│                         Query Analysis                                   │
│                              │                                           │
│         ┌────────────────────┼────────────────────┐                      │
│         ▼                    ▼                    ▼                      │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐                │
│  │ Query Type  │     │ Data Type   │     │ Precision   │                │
│  │ Detection   │     │ Analysis    │     │ Requirement │                │
│  └─────────────┘     └─────────────┘     └─────────────┘                │
│         │                    │                    │                      │
│         └────────────────────┼────────────────────┘                      │
│                              ▼                                           │
│                    ┌─────────────────┐                                   │
│                    │ STRATEGY SELECT │                                   │
│                    └─────────────────┘                                   │
│                              │                                           │
│    ┌─────────────┬───────────┼───────────┬─────────────┐                │
│    ▼             ▼           ▼           ▼             ▼                │
│ ┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐               │
│ │VECTOR│    │KEYWORD│   │HYBRID│    │GRAPH │    │ WEB  │               │
│ │      │    │(BM25) │   │      │    │      │    │SEARCH│               │
│ │Concept│   │Exact  │   │Both  │   │Relat-│    │Real- │               │
│ │match  │   │terms  │   │      │   │ions  │    │time  │               │
│ └──────┘    └──────┘    └──────┘    └──────┘    └──────┘               │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ ROUTING RULES                                                      │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │ • Factual / exact match      → Keyword (BM25) primary             │  │
│  │ • Conceptual / semantic      → Vector primary                      │  │
│  │ • Mixed / uncertain          → Hybrid (RRF fusion)                 │  │
│  │ • Relationship queries       → Graph + Vector                      │  │
│  │ • Current events / real-time → Web search primary                  │  │
│  │ • High precision needed      → Multi-strategy + rerank             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

The Agentic RAG Loop

Unlike single-pass RAG, agentic RAG iterates until it has sufficient information. The agent evaluates results and decides whether to continue searching.

The fundamental limitation of single-pass RAG: Traditional RAG does one retrieval pass: embed the query, find similar chunks, stuff them into a prompt, generate a response. This works for simple queries where the first retrieval attempt finds relevant information. But complex queries often require multiple attempts: the first search might not use the right terminology, might miss a relevant data source, or might return partial information that reveals what else to look for.

How agentic RAG differs: The agent treats retrieval as a tool it can invoke multiple times with different strategies. After each retrieval, the agent evaluates: "Do I have enough information to answer? Are there gaps? Should I search differently?" This self-evaluation loop transforms RAG from a single function call into an iterative research process.

The cost of iteration: Each RAG loop iteration adds latency (200-500ms for retrieval, 500-2000ms for LLM evaluation). We typically limit iterations to 3-5 before forcing a response with whatever information is available. The system should also track token costs across iterations and have budget limits to prevent runaway queries.

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         AGENTIC RAG LOOP                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ Query: "What specific factors drove Q3 decline?"                 │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  1. INITIAL SEARCH  │                                │
│                   │  Strategy: Hybrid   │                                │
│                   │  Sources: CRM, Wiki │                                │
│                   └─────────────────────┘                                │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  2. RETRIEVE        │                                │
│                   │  Results: 12 docs   │                                │
│                   │  Top score: 0.82    │                                │
│                   └─────────────────────┘                                │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  3. SELF-EVALUATE   │◄─────────────────────┐        │
│                   │                     │                      │        │
│                   │  □ Answers query?   │                      │        │
│                   │    → Partial        │                      │        │
│                   │  □ Sufficient       │                      │        │
│                   │    detail?          │                      │        │
│                   │    → No (missing    │                      │        │
│                   │      EMEA data)     │                      │        │
│                   │  □ Sources          │                      │        │
│                   │    reliable?        │                      │        │
│                   │    → Yes            │                      │        │
│                   └─────────────────────┘                      │        │
│                              │                                 │        │
│                         [INSUFFICIENT]                         │        │
│                              │                                 │        │
│                              ▼                                 │        │
│                   ┌─────────────────────┐                      │        │
│                   │  4. REFORMULATE     │                      │        │
│                   │                     │                      │        │
│                   │  New query:         │                      │        │
│                   │  "Q3 EMEA sales     │                      │        │
│                   │   performance       │                      │        │
│                   │   decline factors"  │                      │        │
│                   │                     │                      │        │
│                   │  Add sources:       │                      │        │
│                   │  + Regional reports │                      │        │
│                   └─────────────────────┘                      │        │
│                              │                                 │        │
│                              └─────────────────────────────────┘        │
│                                                                          │
│                      [After 2nd iteration: SUFFICIENT]                   │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  5. RETURN RESULTS  │                                │
│                   │  18 docs, 3 sources │                                │
│                   │  Confidence: 0.85   │                                │
│                   └─────────────────────┘                                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Multi-Source Fusion

Real questions often require combining information from heterogeneous sources. The fusion engine handles deduplication, conflict resolution, and authority weighting:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                       MULTI-SOURCE FUSION ENGINE                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Query: "Q3 revenue performance"                                         │
│                                                                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │ Finance DB  │  │ CRM System  │  │ Sales Wiki  │  │ Exec Slides │    │
│  │             │  │             │  │             │  │             │    │
│  │ Revenue:    │  │ Revenue:    │  │ "Q3 was     │  │ Revenue:    │    │
│  │ $4.2M       │  │ $4.15M      │  │  challenging│  │ $4.2M       │    │
│  │ (precise)   │  │ (pipeline)  │  │  quarter"   │  │ (rounded)   │    │
│  │             │  │             │  │             │  │             │    │
│  │ Authority:  │  │ Authority:  │  │ Authority:  │  │ Authority:  │    │
│  │ CANONICAL   │  │ SECONDARY   │  │ CONTEXTUAL  │  │ SUMMARY     │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
│         │               │               │               │              │
│         └───────────────┴───────────────┴───────────────┘              │
│                                   │                                     │
│                                   ▼                                     │
│                    ┌─────────────────────────┐                          │
│                    │     FUSION ENGINE       │                          │
│                    ├─────────────────────────┤                          │
│                    │                         │                          │
│                    │ 1. DEDUPLICATE          │                          │
│                    │    Same fact from       │                          │
│                    │    Finance + Exec       │                          │
│                    │    → Keep one, note     │                          │
│                    │      multiple sources   │                          │
│                    │                         │                          │
│                    │ 2. RESOLVE CONFLICTS    │                          │
│                    │    Finance: $4.2M       │                          │
│                    │    CRM: $4.15M          │                          │
│                    │    → Use Finance DB     │                          │
│                    │      (canonical source) │                          │
│                    │    → Note: CRM shows    │                          │
│                    │      pipeline, not      │                          │
│                    │      closed revenue     │                          │
│                    │                         │                          │
│                    │ 3. MERGE CONTEXT        │                          │
│                    │    Wiki adds context    │                          │
│                    │    about challenges     │                          │
│                    │    → Append as          │                          │
│                    │      qualitative note   │                          │
│                    │                         │                          │
│                    │ 4. CITATION CHAIN       │                          │
│                    │    Revenue claim:       │                          │
│                    │    [Finance DB, Q3      │                          │
│                    │     report, row 42]     │                          │
│                    │                         │                          │
│                    └─────────────────────────┘                          │
│                                   │                                     │
│                                   ▼                                     │
│                    ┌─────────────────────────┐                          │
│                    │    UNIFIED CONTEXT      │                          │
│                    │                         │                          │
│                    │  Revenue: $4.2M         │                          │
│                    │  Source: Finance DB     │                          │
│                    │  Confidence: HIGH       │                          │
│                    │  Corroborated: 2 srcs   │                          │
│                    │  Context: "Challenging  │                          │
│                    │  quarter" per Wiki      │                          │
│                    └─────────────────────────┘                          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 4: MCP Integration Layer

Model Context Protocol (MCP) provides a standardized way to give agents access to tools, data sources, and external capabilities. Instead of hardcoding tool integrations, MCP allows dynamic discovery and connection.

MCP Architecture

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                         MCP ARCHITECTURE                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                      AGENT RUNTIME                               │    │
│  │                                                                  │    │
│  │  ┌────────────────────────────────────────────────────────┐     │    │
│  │  │                    MCP CLIENT                           │     │    │
│  │  │                                                         │     │    │
│  │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │     │    │
│  │  │  │ Connection   │  │    Tool      │  │   Resource   │  │     │    │
│  │  │  │ Manager      │  │   Router     │  │   Resolver   │  │     │    │
│  │  │  └──────────────┘  └──────────────┘  └──────────────┘  │     │    │
│  │  │                                                         │     │    │
│  │  └────────────────────────────────────────────────────────┘     │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                    │                                     │
│                     JSON-RPC / stdio / SSE                               │
│                                    │                                     │
│       ┌────────────────────────────┼────────────────────────────┐       │
│       ▼                            ▼                            ▼       │
│  ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐  │
│  │  MCP SERVER:     │    │  MCP SERVER:     │    │  MCP SERVER:     │  │
│  │  Filesystem      │    │  Database        │    │  Web Search      │  │
│  ├──────────────────┤    ├──────────────────┤    ├──────────────────┤  │
│  │                  │    │                  │    │                  │  │
│  │  TOOLS:          │    │  TOOLS:          │    │  TOOLS:          │  │
│  │  • read_file     │    │  • query_sql     │    │  • search_web    │  │
│  │  • write_file    │    │  • list_tables   │    │  • fetch_url     │  │
│  │  • list_dir      │    │  • get_schema    │    │  • extract_text  │  │
│  │  • search_files  │    │  • explain_query │    │  • search_news   │  │
│  │                  │    │                  │    │                  │  │
│  │  RESOURCES:      │    │  RESOURCES:      │    │  RESOURCES:      │  │
│  │  • file://docs/* │    │  • db://sales    │    │  • web://        │  │
│  │  • file://code/* │    │  • db://finance  │    │  • news://       │  │
│  │                  │    │                  │    │                  │  │
│  │  PROMPTS:        │    │  PROMPTS:        │    │  PROMPTS:        │  │
│  │  • summarize_doc │    │  • analyze_table │    │  • research_topic│  │
│  │                  │    │                  │    │                  │  │
│  └──────────────────┘    └──────────────────┘    └──────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Tool Discovery and Capability Matching

Agents discover available tools at runtime and match them to task requirements:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                    CAPABILITY DISCOVERY FLOW                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STEP 1: SERVER ADVERTISEMENT                                            │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Server: "enterprise-database"                                      │  │
│  │ Version: "1.2.0"                                                   │  │
│  │                                                                    │  │
│  │ Capabilities:                                                      │  │
│  │   tools: true                                                      │  │
│  │   resources: true                                                  │  │
│  │   prompts: true                                                    │  │
│  │                                                                    │  │
│  │ Tools:                                                             │  │
│  │   - name: "query_sql"                                              │  │
│  │     description: "Execute read-only SQL queries"                   │  │
│  │     inputSchema:                                                   │  │
│  │       query: string (required)                                     │  │
│  │       database: enum [sales, finance, hr]                          │  │
│  │       limit: integer (default: 100)                                │  │
│  │                                                                    │  │
│  │   - name: "get_schema"                                             │  │
│  │     description: "Get table schema and relationships"              │  │
│  │     inputSchema:                                                   │  │
│  │       table: string (required)                                     │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│                                    ▼                                     │
│  STEP 2: AGENT CAPABILITY MATCHING                                       │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Agent task: "Find Q3 revenue breakdown by region"                  │  │
│  │                                                                    │  │
│  │ Required capabilities:                                             │  │
│  │   • Query structured data       ──► matches: query_sql             │  │
│  │   • Understand data structure   ──► matches: get_schema            │  │
│  │   • Filter by time period       ──► query_sql supports WHERE       │  │
│  │   • Group by dimension          ──► query_sql supports GROUP BY    │  │
│  │                                                                    │  │
│  │ Tool selection: query_sql on db://finance                          │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│                                    ▼                                     │
│  STEP 3: TOOL INVOCATION                                                 │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Call: query_sql                                                    │  │
│  │ Arguments:                                                         │  │
│  │   database: "finance"                                              │  │
│  │   query: "SELECT region, SUM(revenue) as total                     │  │
│  │           FROM quarterly_revenue                                   │  │
│  │           WHERE quarter = 'Q3' AND year = 2024                     │  │
│  │           GROUP BY region"                                         │  │
│  │   limit: 50                                                        │  │
│  │                                                                    │  │
│  │ Response:                                                          │  │
│  │   [                                                                │  │
│  │     {"region": "NA", "total": 2100000},                            │  │
│  │     {"region": "EMEA", "total": 1400000},                          │  │
│  │     {"region": "APAC", "total": 700000}                            │  │
│  │   ]                                                                │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Permission Scoping and Security

Different agents get different tool permissions based on their role and the task:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                     PERMISSION SCOPING MODEL                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                    PERMISSION MATRIX                               │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │                     │ Research │ Analyst │ Executor │ Critic     │  │
│  │ TOOL               │ Agent    │ Agent   │ Agent    │ Agent      │  │
│  ├─────────────────────┼──────────┼─────────┼──────────┼────────────┤  │
│  │ filesystem.read     │ ✓ scoped │ ✓ scoped│ ✓ full   │ ✓ scoped   │  │
│  │ filesystem.write    │ ✗        │ ✗       │ ✓ scoped │ ✗          │  │
│  │ database.select     │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ database.insert     │ ✗        │ ✗       │ ✓ +audit │ ✗          │  │
│  │ database.delete     │ ✗        │ ✗       │ ✗        │ ✗          │  │
│  │ web.search          │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ web.fetch           │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ email.send          │ ✗        │ ✗       │ ✓ +human │ ✗          │  │
│  │ api.external        │ ✓ readonly│ ✓ readonly│ ✓ full│ ✓ readonly │  │
│  └─────────────────────┴──────────┴─────────┴──────────┴────────────┘  │
│                                                                          │
│  SCOPING RULES:                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ • "scoped" = limited to specific directories or tables             │  │
│  │ • "+audit" = all actions logged with user attribution             │  │
│  │ • "+human" = requires human approval before execution             │  │
│  │ • "readonly" = can query but not modify                           │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 5: Agent-to-Agent (A2A) Communication

Complex tasks require multiple specialized agents working together. A2A provides the protocol for agents to discover each other, delegate tasks, and share context.

A2A Architecture and Agent Registry

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                       A2A ARCHITECTURE                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                      AGENT REGISTRY                                │  │
│  │                                                                    │  │
│  │  Stores agent cards describing capabilities, inputs, outputs       │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│            ┌───────────────────────┼───────────────────────┐            │
│            ▼                       ▼                       ▼            │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │   AGENT CARD    │    │   AGENT CARD    │    │   AGENT CARD    │     │
│  ├─────────────────┤    ├─────────────────┤    ├─────────────────┤     │
│  │ Name: Research  │    │ Name: Analyst   │    │ Name: Competitor│     │
│  │       Agent     │    │       Agent     │    │       Intel     │     │
│  │                 │    │                 │    │                 │     │
│  │ Description:    │    │ Description:    │    │ Description:    │     │
│  │ Retrieves and   │    │ Synthesizes     │    │ Gathers market  │     │
│  │ summarizes info │    │ data into       │    │ intelligence on │     │
│  │ from internal   │    │ insights and    │    │ competitors     │     │
│  │ sources         │    │ recommendations │    │                 │     │
│  │                 │    │                 │    │                 │     │
│  │ Skills:         │    │ Skills:         │    │ Skills:         │     │
│  │ • RAG retrieval │    │ • Synthesis     │    │ • Web research  │     │
│  │ • Summarization │    │ • Reasoning     │    │ • News analysis │     │
│  │ • Citation      │    │ • Visualization │    │ • Comparison    │     │
│  │                 │    │                 │    │                 │     │
│  │ Input Schema:   │    │ Input Schema:   │    │ Input Schema:   │     │
│  │ • query: string │    │ • data: object  │    │ • companies:    │     │
│  │ • sources: list │    │ • task: string  │    │     string[]    │     │
│  │ • depth: enum   │    │ • format: enum  │    │ • aspects: list │     │
│  │                 │    │                 │    │                 │     │
│  │ Output Schema:  │    │ Output Schema:  │    │ Output Schema:  │     │
│  │ • findings: list│    │ • analysis: str │    │ • intel: object │     │
│  │ • sources: list │    │ • confidence:   │    │ • sources: list │     │
│  │ • confidence:   │    │     float       │    │ • freshness:    │     │
│  │     float       │    │ • citations:    │    │     datetime    │     │
│  │                 │    │     list        │    │                 │     │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Communication Patterns

A2A supports several communication patterns depending on task requirements:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                    A2A COMMUNICATION PATTERNS                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PATTERN 1: DELEGATION (fire and wait)                                   │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Orchestrator                                                      │  │
│  │       │                                                            │  │
│  │       │──── TASK: "Research Q3 revenue" ────►  Research Agent      │  │
│  │       │                                              │             │  │
│  │       │                                              │ (working)   │  │
│  │       │                                              │             │  │
│  │       │◄─── RESULT: {findings, sources} ────────────┘             │  │
│  │       │                                                            │  │
│  │       ▼                                                            │  │
│  │  Continue with results                                             │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 2: CONSULTATION (quick question)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Analyst Agent (mid-task)                                          │  │
│  │       │                                                            │  │
│  │       │──── CONSULT: "Is 15% decline significant?" ───►           │  │
│  │       │                                              Domain Expert │  │
│  │       │◄─── OPINION: "Yes, 3x industry avg" ────────┘             │  │
│  │       │                                                            │  │
│  │       ▼                                                            │  │
│  │  Incorporate insight, continue task                                │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 3: PARALLEL FAN-OUT (concurrent)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │                      Orchestrator                                  │  │
│  │                           │                                        │  │
│  │         ┌─────────────────┼─────────────────┐                      │  │
│  │         │                 │                 │                      │  │
│  │         ▼                 ▼                 ▼                      │  │
│  │    ┌─────────┐      ┌─────────┐      ┌─────────┐                  │  │
│  │    │Research │      │Research │      │Compet-  │                  │  │
│  │    │Agent #1 │      │Agent #2 │      │itor    │                  │  │
│  │    │(Finance)│      │(CRM)    │      │Intel    │                  │  │
│  │    └─────────┘      └─────────┘      └─────────┘                  │  │
│  │         │                 │                 │                      │  │
│  │         └─────────────────┼─────────────────┘                      │  │
│  │                           │                                        │  │
│  │                           ▼                                        │  │
│  │                     Aggregator                                     │  │
│  │                    (merge results)                                 │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 4: PIPELINE (sequential handoff)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Research ──data──► Analyst ──draft──► Critic ──feedback──►       │  │
│  │    Agent             Agent              Agent                      │  │
│  │                                           │                        │  │
│  │                                           ▼                        │  │
│  │                                      [if issues]                   │  │
│  │                                           │                        │  │
│  │                      Analyst ◄────────────┘                        │  │
│  │                       Agent                                        │  │
│  │                      (revise)                                      │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Context Handoff Structure

When agents hand off tasks, they share structured context to maintain coherence:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      CONTEXT HANDOFF STRUCTURE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                       HANDOFF PAYLOAD                              │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │                                                                    │  │
│  │  TASK CONTEXT                                                      │  │
│  │  ├── original_query: "What caused Q3 decline vs competitors?"      │  │
│  │  ├── current_subtask: "Analyze root cause factors"                 │  │
│  │  ├── requesting_agent: "orchestrator"                              │  │
│  │  └── priority: "high"                                              │  │
│  │                                                                    │  │
│  │  ACCUMULATED STATE                                                 │  │
│  │  ├── completed_subtasks:                                           │  │
│  │  │   └── [✓] "Q3 performance: $4.2M, down 15%"                     │  │
│  │  ├── pending_subtasks:                                             │  │
│  │  │   ├── [→] "Root cause analysis" (current)                       │  │
│  │  │   └── [ ] "Competitor comparison"                               │  │
│  │  └── working_hypotheses:                                           │  │
│  │      ├── "Enterprise deal losses (CRM data suggests)"              │  │
│  │      └── "EMEA underperformance (to verify)"                       │  │
│  │                                                                    │  │
│  │  RELEVANT DATA (pre-fetched)                                       │  │
│  │  ├── finance_summary: {revenue: 4.2M, margin: 42%, qoq: -15%}      │  │
│  │  └── source_docs: [doc_id_1, doc_id_2, doc_id_3]                   │  │
│  │                                                                    │  │
│  │  CONSTRAINTS                                                       │  │
│  │  ├── time_scope: "Q3 2024"                                         │  │
│  │  ├── data_access: ["finance", "crm", "wiki"]                       │  │
│  │  ├── max_tokens: 8000                                              │  │
│  │  └── deadline: "2024-01-04T15:00:00Z"                              │  │
│  │                                                                    │  │
│  │  EXPECTED OUTPUT                                                   │  │
│  │  ├── format: "structured_findings"                                 │  │
│  │  ├── required_fields: ["factors", "evidence", "confidence"]        │  │
│  │  └── citation_required: true                                       │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 6: Advanced Reasoning Engine

Complex queries require more than single-pass reasoning. The reasoning engine provides multiple strategies optimized for different problem types.

Reasoning Strategy Selection

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                    REASONING STRATEGY SELECTOR                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ PROBLEM CHARACTERISTICS         │ OPTIMAL STRATEGY                 │  │
│  ├─────────────────────────────────┼─────────────────────────────────┤  │
│  │                                 │                                  │  │
│  │ Simple factual lookup           │ Direct retrieval                 │  │
│  │ "What was Q3 revenue?"          │ → Single RAG pass               │  │
│  │                                 │                                  │  │
│  │ Multi-step reasoning            │ Chain-of-Thought (CoT)           │  │
│  │ "Why did X cause Y?"            │ → Step-by-step reasoning        │  │
│  │                                 │                                  │  │
│  │ Exploration / uncertainty       │ Tree-of-Thought (ToT)            │  │
│  │ "What are possible causes?"     │ → Branch and evaluate paths     │  │
│  │                                 │                                  │  │
│  │ High-stakes decision            │ Self-consistency                 │  │
│  │ "Should we proceed with X?"     │ → Multiple paths, vote          │  │
│  │                                 │                                  │  │
│  │ Complex synthesis               │ Iterative refinement             │  │
│  │ "Comprehensive analysis of..."  │ → Generate, critique, improve   │  │
│  │                                 │                                  │  │
│  │ Planning / decomposition        │ Plan-then-execute                │  │
│  │ "How do we achieve X?"          │ → Create plan, execute steps    │  │
│  │                                 │                                  │  │
│  └─────────────────────────────────┴─────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Chain-of-Thought with Grounded Verification

For our Enterprise Knowledge Agent, we use CoT with verification against retrieved sources:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                   GROUNDED CHAIN-OF-THOUGHT                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Query: "Why did revenue decline more than competitors?"                 │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 1: ESTABLISH FACTS                                          │    │
│  │                                                                  │    │
│  │ Thought: "First, I need to quantify our decline and compare      │    │
│  │          it to the market."                                      │    │
│  │                                                                  │    │
│  │ Action: Retrieve from Finance DB + Industry reports              │    │
│  │                                                                  │    │
│  │ Observation:                                                     │    │
│  │   • Our decline: 15% QoQ [source: finance_db, confidence: HIGH]  │    │
│  │   • Industry avg: +3% QoQ [source: analyst_report, conf: MEDIUM] │    │
│  │   • Gap: 18 percentage points worse than market                  │    │
│  │                                                                  │    │
│  │ Verification: ✓ Numbers sourced, ✓ Sources authoritative         │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 2: INVESTIGATE CAUSES                                       │    │
│  │                                                                  │    │
│  │ Thought: "18pp gap is significant. I need to identify company-   │    │
│  │          specific factors that explain this divergence."         │    │
│  │                                                                  │    │
│  │ Action: Search CRM notes, support tickets, sales wiki            │    │
│  │                                                                  │    │
│  │ Observations:                                                    │    │
│  │   • Lost 3 enterprise deals to Acme Corp [source: CRM]           │    │
│  │   • EMEA renewal delays due to procurement [source: sales_wiki]  │    │
│  │   • Product gap: missing API feature [source: support_tickets]   │    │
│  │                                                                  │    │
│  │ Verification: ✓ Multiple corroborating sources                   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 3: SYNTHESIZE                                               │    │
│  │                                                                  │    │
│  │ Thought: "I can now construct a causal narrative linking         │    │
│  │          the evidence to the outcome."                           │    │
│  │                                                                  │    │
│  │ Synthesis:                                                       │    │
│  │   Primary factor: Competitive losses (3 deals = ~$1.2M impact)   │    │
│  │   Secondary: EMEA delays (timing, not loss)                      │    │
│  │   Contributing: Product gap (enabled competitor win)             │    │
│  │                                                                  │    │
│  │ Confidence calibration:                                          │    │
│  │   • Competitive losses: HIGH (direct CRM evidence)               │    │
│  │   • EMEA delays: MEDIUM (wiki mentions, unquantified)            │    │
│  │   • Product gap: MEDIUM (support tickets, correlation)           │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Reflection and Self-Critique Loop

After generating an answer, the Critic Agent evaluates quality:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      SELF-CRITIQUE CHECKLIST                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Generated Response: [Analysis of Q3 decline factors]                    │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ COMPLETENESS                                                       │  │
│  │ □ Answers the original question?                    [✓ YES]        │  │
│  │ □ Addresses all sub-questions?                      [✓ 3/3]        │  │
│  │ □ Includes competitor comparison?                   [✓ YES]        │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ ACCURACY                                                           │  │
│  │ □ All claims have citations?                        [✓ 7/7]        │  │
│  │ □ Numbers verified against sources?                 [✓ YES]        │  │
│  │ □ No unsupported speculation?                       [⚠ 1 flag]     │  │
│  │   → "Product gap likely contributed" - needs evidence              │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LOGIC                                                              │  │
│  │ □ Reasoning chain is valid?                         [✓ YES]        │  │
│  │ □ No logical leaps?                                 [✓ YES]        │  │
│  │ □ Alternative explanations considered?              [⚠ PARTIAL]    │  │
│  │   → Could mention macroeconomic factors                            │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ CONFIDENCE CALIBRATION                                             │  │
│  │ □ Confidence levels appropriate?                    [✓ YES]        │  │
│  │ □ Uncertainties acknowledged?                       [✓ YES]        │  │
│  │ □ Limitations stated?                               [⚠ ADD]        │  │
│  │   → Should note EMEA data is limited                               │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  VERDICT: PASS with minor revisions                                      │
│  ACTIONS:                                                                │
│    1. Add evidence for product gap claim OR soften language             │
│    2. Add caveat about EMEA data limitations                            │
│    3. Optional: mention macro factors as alternative hypothesis         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 7: Conversation and Context Management

Agents must maintain coherent context across multi-turn conversations and complex multi-step tasks. This requires a hierarchical approach to memory and intelligent context window management.

Context Hierarchy

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                       CONTEXT HIERARCHY                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 1: SYSTEM CONTEXT (Persistent across all sessions)           │  │
│  │                                                                    │  │
│  │ • Agent identity, role, capabilities                               │  │
│  │ • Organization knowledge (policies, structure)                     │  │
│  │ • Tool permissions and access controls                             │  │
│  │ • User profile (role, preferences, history summary)                │  │
│  │                                                                    │  │
│  │ Token budget: ~2,000 tokens (always included)                      │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 2: SESSION CONTEXT (Per conversation)                        │  │
│  │                                                                    │  │
│  │ • Conversation history (summarized if long)                        │  │
│  │ • Current task state and progress                                  │  │
│  │ • Retrieved documents (this session)                               │  │
│  │ • Working hypotheses and intermediate findings                     │  │
│  │ • Agent handoff history                                            │  │
│  │                                                                    │  │
│  │ Token budget: ~20,000 tokens (managed dynamically)                 │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 3: TURN CONTEXT (Per interaction)                            │  │
│  │                                                                    │  │
│  │ • Current user query                                               │  │
│  │ • Immediate tool results                                           │  │
│  │ • Current reasoning trace                                          │  │
│  │ • Active sub-task context                                          │  │
│  │                                                                    │  │
│  │ Token budget: ~40,000 tokens (primary working space)               │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ OUTPUT RESERVATION                                                 │  │
│  │                                                                    │  │
│  │ Reserved for model response generation                             │  │
│  │                                                                    │  │
│  │ Token budget: ~30,000 tokens (reserved)                            │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  TOTAL: ~92,000 tokens used of 128K context window                       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Context Overflow Strategies

When context exceeds budget, the system applies progressive compression:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                   CONTEXT OVERFLOW MANAGEMENT                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Context Usage: 95,000 / 92,000 tokens (OVERFLOW)                        │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 1: CONVERSATION SUMMARIZATION                             │  │
│  │                                                                    │  │
│  │ Before: Full conversation history (25,000 tokens)                  │  │
│  │ After:  Summarized history (5,000 tokens)                          │  │
│  │                                                                    │  │
│  │ Summary includes:                                                  │  │
│  │ • Key topics discussed                                             │  │
│  │ • Decisions made                                                   │  │
│  │ • Important facts established                                      │  │
│  │ • Current task status                                              │  │
│  │                                                                    │  │
│  │ Savings: 20,000 tokens                                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 2: DOCUMENT COMPRESSION                                   │  │
│  │                                                                    │  │
│  │ Before: Full retrieved documents (40,000 tokens)                   │  │
│  │ After:  Key excerpts only (15,000 tokens)                          │  │
│  │                                                                    │  │
│  │ Keeps:                                                             │  │
│  │ • Sentences containing query keywords                              │  │
│  │ • Surrounding context (1 sentence before/after)                    │  │
│  │ • Document titles and metadata                                     │  │
│  │                                                                    │  │
│  │ Savings: 25,000 tokens                                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 3: WORKING MEMORY OFFLOAD                                 │  │
│  │                                                                    │  │
│  │ Move to retrievable storage:                                       │  │
│  │ • Completed sub-task details (keep summaries)                      │  │
│  │ • Alternative hypotheses (keep top 2)                              │  │
│  │ • Verbose tool outputs (keep structured extracts)                  │  │
│  │                                                                    │  │
│  │ Can be retrieved if needed for follow-up questions                 │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  New Context Usage: 65,000 / 92,000 tokens (OK)                          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 8: Production Patterns

Observability Architecture

Comprehensive observability is essential for debugging multi-agent systems:

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      OBSERVABILITY STACK                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ DISTRIBUTED TRACES                                                 │  │
│  │                                                                    │  │
│  │ request_id: abc-123                                                │  │
│  │ user_id: user_456                                                  │  │
│  │ total_duration: 8.2s                                               │  │
│  │ total_tokens: 14,200                                               │  │
│  │                                                                    │  │
│  │ ├── gateway (45ms)                                                 │  │
│  │ │   ├── input_validation (5ms)                                     │  │
│  │ │   └── guardrail_check (40ms)                                     │  │
│  │ │                                                                  │  │
│  │ ├── orchestrator (120ms)                                           │  │
│  │ │   ├── query_analysis (80ms) [tokens: 500]                        │  │
│  │ │   └── agent_routing (40ms)                                       │  │
│  │ │                                                                  │  │
│  │ ├── research_agent (3.2s) [parallel]                               │  │
│  │ │   ├── rag_retrieval (800ms)                                      │  │
│  │ │   │   ├── vector_search (200ms) [results: 15]                    │  │
│  │ │   │   ├── keyword_search (150ms) [results: 8]                    │  │
│  │ │   │   └── reranking (100ms) [final: 10]                          │  │
│  │ │   └── llm_reasoning (2.2s) [tokens: 4,200]                       │  │
│  │ │                                                                  │  │
│  │ ├── competitor_agent (3.5s) [parallel]                             │  │
│  │ │   ├── web_search (1.8s) [results: 12]                            │  │
│  │ │   └── llm_analysis (1.5s) [tokens: 3,100]                        │  │
│  │ │                                                                  │  │
│  │ ├── synthesis (1.8s)                                               │  │
│  │ │   └── llm_synthesis (1.8s) [tokens: 5,400]                       │  │
│  │ │                                                                  │  │
│  │ └── critic_review (0.6s)                                           │  │
│  │     └── llm_critique (0.6s) [tokens: 1,000]                        │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ KEY METRICS                                                        │  │
│  │                                                                    │  │
│  │ Latency:                        Quality:                           │  │
│  │ • p50: 5.2s                     • task_success_rate: 94%           │  │
│  │ • p95: 12.1s                    • rag_precision@5: 0.78            │  │
│  │ • p99: 18.3s                    • citation_coverage: 97%           │  │
│  │                                                                    │  │
│  │ Cost:                           Reliability:                       │  │
│  │ • tokens_per_request: 14,200    • error_rate: 2.1%                 │  │
│  │ • cost_per_request: $0.18      • retry_rate: 5.3%                  │  │
│  │ • cache_hit_rate: 23%           • timeout_rate: 0.8%               │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Cost Optimization

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      COST OPTIMIZATION STRATEGIES                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. MODEL ROUTING BY TASK COMPLEXITY                                     │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Task Type              │ Model        │ Cost/1K tokens            │  │
│  │  ───────────────────────┼──────────────┼─────────────────────────  │  │
│  │  Query classification   │ Haiku        │ $0.00025                  │  │
│  │  Simple retrieval       │ Haiku        │ $0.00025                  │  │
│  │  Guardrail checks       │ Haiku        │ $0.00025                  │  │
│  │  Synthesis              │ Sonnet       │ $0.003                    │  │
│  │  Complex reasoning      │ Opus         │ $0.015                    │  │
│  │  Critic review          │ Sonnet       │ $0.003                    │  │
│  │                                                                    │  │
│  │  Blended average: ~$0.004/1K tokens (vs $0.015 all-Opus)           │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  2. MULTI-LEVEL CACHING                                                  │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Cache Layer          │ Hit Rate │ Latency Saved │ Cost Saved      │  │
│  │  ─────────────────────┼──────────┼───────────────┼────────────────  │  │
│  │  Exact query cache    │ 8%       │ 100%          │ 100%            │  │
│  │  Semantic query cache │ 15%      │ 80%           │ 80%             │  │
│  │  RAG result cache     │ 25%      │ 40%           │ 30%             │  │
│  │  Tool result cache    │ 35%      │ 20%           │ 15%             │  │
│  │                                                                    │  │
│  │  Combined savings: ~30% cost reduction                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  3. EARLY TERMINATION                                                    │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  • Simple queries: Skip multi-agent orchestration                  │  │
│  │  • High-confidence answers: Skip critic review                     │  │
│  │  • Cached results valid: Skip full pipeline                        │  │
│  │  • User abandons: Stop in-progress agents                          │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Graceful Degradation

Code

┌─────────────────────────────────────────────────────────────────────────┐
│                      GRACEFUL DEGRADATION CHAIN                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Primary Path: Full multi-agent pipeline                                 │
│                        │                                                 │
│                        ▼                                                 │
│              ┌──────────────────┐                                        │
│              │    SUCCESS?      │───Yes───► Return full response         │
│              └──────────────────┘                                        │
│                        │ No                                              │
│                        ▼                                                 │
│  Fallback 1: Retry with exponential backoff                              │
│              ┌──────────────────┐                                        │
│              │ Retry (3 attempts)│───Success───► Return response         │
│              └──────────────────┘                                        │
│                        │ Fail                                            │
│                        ▼                                                 │
│  Fallback 2: Simplified single-agent                                     │
│              ┌──────────────────┐                                        │
│              │ Single agent with│───Success───► Return (note: simplified)│
│              │ basic RAG        │                                        │
│              └──────────────────┘                                        │
│                        │ Fail                                            │
│                        ▼                                                 │
│  Fallback 3: Return partial results                                      │
│              ┌──────────────────┐                                        │
│              │ Any completed    │───Has data──► Return partial           │
│              │ sub-tasks?       │               (note: incomplete)       │
│              └──────────────────┘                                        │
│                        │ No data                                         │
│                        ▼                                                 │
│  Fallback 4: Graceful error message                                      │
│              ┌──────────────────────────────────────────────────────┐   │
│              │ "I wasn't able to complete this analysis.            │   │
│              │  What I tried:                                       │   │
│              │    • Searched internal finance database              │   │
│              │    • Queried CRM for deal information                │   │
│              │  What failed:                                        │   │
│              │    • Web search timed out                            │   │
│              │  Suggested next steps:                               │   │
│              │    • Try again in a few minutes                      │   │
│              │    • Ask a more specific question about internal data│   │
│              │    • Contact support if issue persists"              │   │
│              └──────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 9: Implementation Roadmap

Building this architecture incrementally:

Phase 1: Foundation

Single agent with basic RAG (one vector store)
2-3 hardcoded tools
Simple conversation history
Basic logging

Outcome: Working prototype for simple, single-domain queries

Phase 2: Intelligence

Query analysis and complexity classification
Multi-strategy RAG (vector + keyword + hybrid)
Agentic RAG loop (iterative retrieval)
Chain-of-thought reasoning

Outcome: Handles complex single-domain queries

Phase 3: Multi-Agent

Specialized agents (researcher, analyst, critic)
Agent routing and orchestration
A2A communication protocol
Context handoff between agents

Outcome: Handles complex multi-domain queries

Phase 4: Production

MCP integration for tools
Full observability (traces, metrics)
Caching layers
Cost optimization
Graceful degradation

Outcome: Production-ready system

Conclusion

Building a knowledge agent platform requires integrating multiple sophisticated components:

Query Understanding classifies complexity and decomposes into sub-questions
Multi-Strategy RAG retrieves intelligently with iterative refinement
MCP provides standardized, dynamic tool access
A2A enables agent collaboration and delegation
Reasoning Engine handles multi-step thinking with verification
Context Management maintains coherence across interactions
Orchestration coordinates the entire flow

Start simple—a single agent with basic RAG delivers value immediately. Add components as your use cases demand. The architecture is modular so you can evolve incrementally.

Table of Contents

Introduction

The Enterprise Knowledge Agent: Our Reference System

Part 1: System Architecture Overview

The Complete Stack

Layer Responsibilities

Complete Request Flow

Part 2: Intelligent Query Understanding

Query Classification System

Query Decomposition

Decomposition Strategies by Query Type

Part 3: Multi-Strategy RAG Layer

Search Strategy Router

The Agentic RAG Loop

Multi-Source Fusion

Part 4: MCP Integration Layer

MCP Architecture

Tool Discovery and Capability Matching

Permission Scoping and Security

Part 5: Agent-to-Agent (A2A) Communication

A2A Architecture and Agent Registry

Communication Patterns

Context Handoff Structure

Part 6: Advanced Reasoning Engine

Reasoning Strategy Selection

Chain-of-Thought with Grounded Verification

Reflection and Self-Critique Loop

Part 7: Conversation and Context Management

Context Hierarchy

Context Overflow Strategies

Part 8: Production Patterns

Observability Architecture

Cost Optimization

Graceful Degradation

Part 9: Implementation Roadmap

Phase 1: Foundation

Phase 2: Intelligence

Phase 3: Multi-Agent

Phase 4: Production

Conclusion

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Building Agentic AI Systems: A Complete Implementation Guide

Building Production-Ready RAG Systems: Lessons from the Field

Building MCP Servers: Custom Tool Integrations for AI Agents

Building Customer Support Agents: A Production Architecture Guide

LLM Memory Systems: From MemGPT to Long-Term Agent Memory

Building Deep Research AI: From Query to Comprehensive Report