Skip to main content
Back to Blog

Building a Knowledge Agent Platform: Multi-Agent Architecture with RAG, MCP, and Orchestration

A complete guide to building production AI agent platforms—integrating agentic RAG, MCP tools, A2A communication, advanced reasoning, and multi-agent orchestration into a unified system.

9 min read
Share:

Introduction

Building a truly capable AI agent requires more than a good prompt and a few tools. Production systems like Claude Code, Cursor, and Perplexity demonstrate what's possible when you combine multiple specialized components: intelligent retrieval, standardized tool protocols, multi-agent coordination, and sophisticated reasoning.

2025: The year of enterprise agentic architecture: Gartner predicts that 40% of enterprise applications will include integrated task-specific agents by 2026, up from less than 5% today. According to Bain & Company's 2025 Technology Report, 5-10% of technology spending over the next 3-5 years will be directed toward building foundational agent capabilities.

The knowledge layer is foundational: As InfoQ's architectural analysis notes, "without a knowledge layer, the agent is just doing things based on gut feeling. With it, the agent gains boundaries—what's true, what's allowed, what's relevant, what's outdated, what's sensitive." This is why knowledge agent platforms require such careful architecture.

Major platforms emerging in 2025: Microsoft Foundry combines a large model catalog, agent runtime, retrieval layer, and control plane into an enterprise agent platform. ServiceNow's AI Platform unifies intelligence, data, and orchestration with Knowledge Graphs and AI Agent Fabric. Meanwhile, Google's Agent2Agent (A2A) protocol enables agents to discover and collaborate across platforms.

This guide presents a complete architecture for building knowledge agent platforms—systems that can research, reason, execute, and learn across complex multi-step tasks. We'll use a practical example throughout: an Enterprise Knowledge Agent that helps organizations answer complex questions by searching internal documents, querying databases, browsing the web, and synthesizing findings.

What makes this different from simpler agent architectures?

Simple AgentKnowledge Agent Platform
Single LLM with toolsMultiple specialized agents
Basic RAG retrievalMulti-strategy agentic RAG
Hardcoded toolsDynamic MCP tool discovery
Isolated executionA2A agent communication
StatelessPersistent memory and context
Single reasoning passIterative reasoning with reflection

Prerequisites: This post assumes familiarity with basic agent concepts. For foundations, see Building Agentic AI Systems and Building Production-Ready RAG Systems.


The Enterprise Knowledge Agent: Our Reference System

Throughout this guide, we'll build an Enterprise Knowledge Agent with these capabilities:

User query: "What were the key factors in our Q3 revenue decline, and how do they compare to competitor performance?"

What the system does:

  1. Understands the query requires financial data, internal documents, and competitive intelligence
  2. Plans a multi-step research approach
  3. Retrieves from internal wikis, financial databases, CRM notes, and web sources
  4. Delegates sub-tasks to specialized agents (financial analyst, competitive researcher)
  5. Synthesizes findings into a coherent analysis
  6. Cites all sources with confidence levels

This isn't a simple RAG query—it requires orchestrating multiple data sources, agents, and reasoning steps.


Part 1: System Architecture Overview

The Complete Stack

A knowledge agent platform has six distinct layers, each with specific responsibilities:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         USER INTERFACE LAYER                             │
│                    (Chat, API, Scheduled Tasks)                          │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         GATEWAY & SAFETY LAYER                           │
│              Input Validation │ Rate Limiting │ Guardrails               │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        ORCHESTRATION LAYER                               │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │   Query     │  │   Agent     │  │  Execution  │  │   Result    │    │
│  │  Analyzer   │→ │   Router    │→ │   Engine    │→ │ Synthesizer │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                          AGENT LAYER                                     │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │  Research   │  │  Analyst    │  │  Executor   │  │   Critic    │    │
│  │   Agent     │  │   Agent     │  │   Agent     │  │   Agent     │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
│                         ↑    A2A Protocol    ↑                          │
│                         └────────────────────┘                          │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                    ┌───────────────┼───────────────┐
                    ▼               ▼               ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                        CAPABILITY LAYER                                  │
│  ┌─────────────────────┐  ┌─────────────────────┐                       │
│  │   Multi-Search RAG  │  │    MCP Tool Layer   │                       │
│  │  ┌───┐ ┌───┐ ┌───┐  │  │  ┌───┐ ┌───┐ ┌───┐ │                       │
│  │  │Vec│ │Key│ │Web│  │  │  │DB │ │API│ │File│ │                       │
│  │  └───┘ └───┘ └───┘  │  │  └───┘ └───┘ └───┘ │                       │
│  └─────────────────────┘  └─────────────────────┘                       │
└─────────────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────────────────────────────────────────────────────────────┐
│                         MEMORY & STATE LAYER                             │
│     Working Memory │ Session State │ Long-term Memory │ Audit Log        │
└─────────────────────────────────────────────────────────────────────────┘

Layer Responsibilities

Each layer has clear boundaries and interfaces:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                        LAYER RESPONSIBILITIES                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  INTERFACE LAYER                                                         │
│  ├── Accept user input (text, files, structured queries)                 │
│  ├── Stream responses back to user                                       │
│  ├── Handle authentication and session management                        │
│  └── Support multiple channels (chat, API, webhooks)                     │
│                                                                          │
│  GATEWAY LAYER                                                           │
│  ├── Validate input format and size limits                               │
│  ├── Apply rate limiting per user/organization                           │
│  ├── Run guardrails (relevance, safety, jailbreak detection)             │
│  └── Log all requests for audit                                          │
│                                                                          │
│  ORCHESTRATION LAYER                                                     │
│  ├── Analyze query complexity and requirements                           │
│  ├── Route to appropriate agents                                         │
│  ├── Manage parallel vs. sequential execution                            │
│  └── Synthesize final response from agent outputs                        │
│                                                                          │
│  AGENT LAYER                                                             │
│  ├── Execute domain-specific tasks                                       │
│  ├── Use tools via MCP                                                   │
│  ├── Communicate with other agents via A2A                               │
│  └── Apply reasoning strategies (CoT, reflection)                        │
│                                                                          │
│  CAPABILITY LAYER                                                        │
│  ├── Multi-strategy retrieval (vector, keyword, graph)                   │
│  ├── Tool execution via MCP servers                                      │
│  └── External API integrations                                           │
│                                                                          │
│  MEMORY LAYER                                                            │
│  ├── Working memory (current task state)                                 │
│  ├── Session memory (conversation history)                               │
│  ├── Long-term memory (user preferences, learned facts)                  │
│  └── Audit log (all actions for compliance)                              │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Complete Request Flow

When a user asks our Enterprise Knowledge Agent a question, here's the complete flow:

Code
User Query: "What caused Q3 revenue decline vs competitors?"
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 1. GATEWAY                                                │
│    • Input validation: ✓ valid text                       │
│    • Rate limit check: ✓ under quota                      │
│    • Guardrail check: ✓ business-relevant query           │
│    • User context loaded: Enterprise tier, Finance role   │
└──────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 2. QUERY ANALYSIS                                         │
│    • Complexity: HIGH (multi-source, comparative)         │
│    • Intent: Analysis + Comparison                        │
│    • Required sources: Internal finance, CRM, Web         │
│    • Decomposition: 3 sub-questions identified            │
│    • Estimated tokens: ~15,000                            │
│    • Estimated latency: 8-12 seconds                      │
└──────────────────────────────────────────────────────────┘
                              │
                              ▼
┌──────────────────────────────────────────────────────────┐
│ 3. AGENT ROUTING                                          │
│    • Primary: Analyst Agent (synthesis role)              │
│    • Delegated: Research Agent (internal data)            │
│    • Delegated: Competitor Intel Agent (market data)      │
│    • Execution: Parallel (agents are independent)         │
└──────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Research Agent │ │ Research Agent │ │ Competitor     │
│ Sub-Q1: Q3     │ │ Sub-Q2: Root   │ │ Intel Agent    │
│ performance    │ │ cause factors  │ │                │
│                │ │                │ │                │
│ Tools:         │ │ Tools:         │ │ Tools:         │
│ • SQL Query    │ │ • CRM Search   │ │ • Web Search   │
│ • Doc Search   │ │ • Wiki Search  │ │ • News API     │
└────────────────┘ └────────────────┘ └────────────────┘
        │                   │                   │
        ▼                   ▼                   ▼
┌────────────────┐ ┌────────────────┐ ┌────────────────┐
│ Findings:      │ │ Findings:      │ │ Findings:      │
│ Revenue: $4.2M │ │ Lost 3 deals   │ │ Competitor A   │
│ Down 15% QoQ   │ │ to Acme Corp   │ │ grew 8%        │
│ Margin: 42%    │ │ Delayed renew- │ │ Competitor B   │
│ [confidence:   │ │ als in EMEA    │ │ flat           │
│  HIGH]         │ │ [confidence:   │ │ [confidence:   │
│                │ │  MEDIUM]       │ │  MEDIUM]       │
└────────────────┘ └────────────────┘ └────────────────┘
        │                   │                   │
        └───────────────────┼───────────────────┘
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 4. SYNTHESIS                                              │
│    Analyst Agent receives all findings:                   │
│    • Merge findings from 3 sources                        │
│    • Resolve: CRM data vs Finance DB (use Finance)        │
│    • Identify: Our 15% decline vs industry 3% growth      │
│    • Structure: Executive summary + details               │
│    • Citations: 7 sources with confidence levels          │
└──────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 5. QUALITY CHECK                                          │
│    Critic Agent reviews:                                  │
│    □ Answers original question? ✓                         │
│    □ All claims sourced? ✓ (7/7 cited)                    │
│    □ Logical gaps? ⚠ (EMEA detail sparse)                 │
│    □ Confidence calibrated? ✓                             │
│    → PASS with note: "EMEA analysis limited by CRM data"  │
└──────────────────────────────────────────────────────────┘
                            │
                            ▼
┌──────────────────────────────────────────────────────────┐
│ 6. RESPONSE                                               │
│    Formatted response with:                               │
│    • Executive summary (2 paragraphs)                     │
│    • Key findings (bulleted)                              │
│    • Competitor comparison table                          │
│    • Sources and confidence levels                        │
│    • Caveat about EMEA data limitations                   │
└──────────────────────────────────────────────────────────┘

Part 2: Intelligent Query Understanding

The first challenge is understanding what the user actually needs. Simple keyword matching fails for complex queries—you need semantic understanding and task decomposition.

Why query understanding is the foundation of intelligent agents: The difference between a demo and a production system often comes down to query understanding. A demo can assume well-formed queries: "What is the capital of France?" Production systems face real queries: "so what happened with that thing Sarah mentioned last quarter about the revenue stuff?" This query requires resolving "that thing Sarah mentioned" (conversation context), "last quarter" (temporal reasoning), and "revenue stuff" (semantic understanding) before any retrieval can happen.

The cost of misunderstanding: When you misclassify a query, every downstream step is wrong. If the system thinks "Compare our pricing with competitors" is a simple lookup rather than a comparison requiring multiple data sources, it might return a single pricing document rather than the competitive analysis the user needs. Even perfect retrieval and generation can't fix a fundamental misunderstanding of intent.

Query Classification System

Every incoming query is analyzed across multiple dimensions. This isn't a single classifier—it's a multi-head analysis that extracts several orthogonal properties simultaneously:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         QUERY CLASSIFICATION                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Input: "What caused Q3 revenue decline vs competitors?"                 │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ DIMENSION           │ CLASSIFICATION        │ IMPACT              │  │
│  ├─────────────────────┼───────────────────────┼─────────────────────┤  │
│  │ Complexity          │ HIGH                  │ Multi-agent needed  │  │
│  │ Intent              │ ANALYSIS + COMPARE    │ Synthesis required  │  │
│  │ Temporal scope      │ Q3 2024 (specific)    │ Filter retrieval    │  │
│  │ Data domains        │ Finance, CRM, Market  │ Multiple sources    │  │
│  │ Output format       │ Report with citations │ Structured output   │  │
│  │ Confidence need     │ HIGH (business)       │ Add critic review   │  │
│  │ User role           │ Finance (permitted)   │ Full data access    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  Routing Decision: Multi-agent parallel with synthesis                   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Understanding each classification dimension:

Complexity determines resource allocation. A HIGH complexity query needs multiple agents working in parallel, extended time budgets, and possibly multiple retrieval rounds. A LOW complexity query (simple factual lookup) can be answered in a single retrieval pass with a single agent. Over-allocating resources to simple queries wastes compute and adds latency. Under-allocating to complex queries produces incomplete answers.

Intent shapes the output format and synthesis strategy. An ANALYSIS intent produces explanatory prose with supporting evidence. A COMPARE intent produces structured comparisons (tables, bullet points). Getting this wrong means returning a wall of text when the user wanted a simple comparison table, or vice versa.

Temporal scope is critical for enterprise data. "Q3 2024" must be translated into date filters (July 1 - September 30, 2024) applied to every data source. Without explicit temporal filtering, the system might return data from the wrong period, producing confidently wrong answers.

Data domains determine which retrieval systems and agents to invoke. Finance queries go to the SQL database and financial reports. CRM queries search support tickets and customer notes. Market queries need external web search. Multi-domain queries require coordination across all of these.

User role is often overlooked but essential for access control. A finance analyst can see detailed revenue breakdowns. A general employee might only see summary statistics. The classification system must know who's asking to return appropriate results.

Query Decomposition

Complex queries must be broken into answerable sub-questions. The decomposer identifies dependencies between sub-questions. This is one of the most important capabilities distinguishing simple RAG from agentic systems—the ability to plan multi-step research:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                       QUERY DECOMPOSITION                                │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Original: "What caused Q3 revenue decline vs competitors?"              │
│                                                                          │
│                         ┌─────────────────┐                              │
│                         │   DECOMPOSER    │                              │
│                         └────────┬────────┘                              │
│                                  │                                       │
│         ┌────────────────────────┼────────────────────────┐              │
│         ▼                        ▼                        ▼              │
│  ┌─────────────────┐   ┌─────────────────┐   ┌─────────────────┐        │
│  │    SUB-Q1       │   │    SUB-Q2       │   │    SUB-Q3       │        │
│  │                 │   │                 │   │                 │        │
│  │ "What was our   │   │ "What specific  │   │ "How did key    │        │
│  │  Q3 revenue     │   │  factors drove  │   │  competitors    │        │
│  │  performance?"  │   │  the decline?"  │   │  perform in Q3?"│        │
│  │                 │   │                 │   │                 │        │
│  │ Sources:        │   │ Sources:        │   │ Sources:        │        │
│  │ • Finance DB    │   │ • CRM           │   │ • Web search    │        │
│  │ • Revenue table │   │ • Sales wiki    │   │ • News APIs     │        │
│  │                 │   │ • Support logs  │   │ • Analyst rpts  │        │
│  │                 │   │                 │   │                 │        │
│  │ Dependency:     │   │ Dependency:     │   │ Dependency:     │        │
│  │ None (start)    │   │ Needs Q1 result │   │ None (parallel) │        │
│  └─────────────────┘   └─────────────────┘   └─────────────────┘        │
│         │                        │                        │              │
│         │                        │                        │              │
│         └─────────── DEPENDENCY GRAPH ────────────────────┘              │
│                                                                          │
│  Execution Order:                                                        │
│  • Phase 1 (parallel): Q1 + Q3                                          │
│  • Phase 2 (sequential): Q2 (needs Q1 context)                          │
│  • Phase 3: Synthesis                                                    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

The dependency graph is crucial for efficiency: Notice that Q1 and Q3 have no dependencies—they can execute in parallel. Q2 depends on Q1's result because understanding "what specific factors drove the decline" requires first knowing what the decline looked like. Without dependency analysis, you'd either: (1) execute everything sequentially (slow), or (2) execute everything in parallel and miss critical context (incomplete).

Why sub-questions have explicit source assignments: Each sub-question specifies which data sources to query. This prevents the system from searching everywhere for everything (expensive and slow) and enables targeted retrieval. The source assignments come from learned patterns: financial performance questions → Finance DB; customer feedback → CRM; competitive intelligence → web search.

The execution plan trades off latency and accuracy: Phase 1 runs Q1 and Q3 in parallel (typically 2-3 seconds). Phase 2 runs Q2 with Q1's context (another 2-3 seconds). Phase 3 synthesizes everything (1-2 seconds). Total: ~6-8 seconds. Running everything sequentially would take 9-12 seconds. Running everything in parallel would produce an incomplete Q2 answer.

Decomposition Strategies by Query Type

Different query types require different decomposition approaches:

Query TypeExampleDecomposition Strategy
Comparative"A vs B vs C"Split into individual entity queries, then synthesize comparison
Temporal"Trend over Q1-Q4"Break into time periods, identify inflection points
Causal"Why did X happen?"Establish facts first (what), then investigate causes (why)
Multi-domain"Technical and business impact"Separate by domain expertise, merge with domain expert
Conditional"If X then what?"Establish baseline, model scenarios
Aggregation"Summary of all projects"Parallel retrieval, progressive summarization

Part 3: Multi-Strategy RAG Layer

Simple RAG (embed query → find similar chunks → return) fails for complex questions. Our architecture uses agentic RAG with multiple retrieval strategies, intelligent routing, and self-evaluation.

Search Strategy Router

Different queries need different retrieval approaches. The router analyzes the query and selects the optimal strategy:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      SEARCH STRATEGY ROUTER                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│                         Query Analysis                                   │
│                              │                                           │
│         ┌────────────────────┼────────────────────┐                      │
│         ▼                    ▼                    ▼                      │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐                │
│  │ Query Type  │     │ Data Type   │     │ Precision   │                │
│  │ Detection   │     │ Analysis    │     │ Requirement │                │
│  └─────────────┘     └─────────────┘     └─────────────┘                │
│         │                    │                    │                      │
│         └────────────────────┼────────────────────┘                      │
│                              ▼                                           │
│                    ┌─────────────────┐                                   │
│                    │ STRATEGY SELECT │                                   │
│                    └─────────────────┘                                   │
│                              │                                           │
│    ┌─────────────┬───────────┼───────────┬─────────────┐                │
│    ▼             ▼           ▼           ▼             ▼                │
│ ┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐    ┌──────┐               │
│ │VECTOR│    │KEYWORD│   │HYBRID│    │GRAPH │    │ WEB  │               │
│ │      │    │(BM25) │   │      │    │      │    │SEARCH│               │
│ │Concept│   │Exact  │   │Both  │   │Relat-│    │Real- │               │
│ │match  │   │terms  │   │      │   │ions  │    │time  │               │
│ └──────┘    └──────┘    └──────┘    └──────┘    └──────┘               │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ ROUTING RULES                                                      │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │ • Factual / exact match      → Keyword (BM25) primary             │  │
│  │ • Conceptual / semantic      → Vector primary                      │  │
│  │ • Mixed / uncertain          → Hybrid (RRF fusion)                 │  │
│  │ • Relationship queries       → Graph + Vector                      │  │
│  │ • Current events / real-time → Web search primary                  │  │
│  │ • High precision needed      → Multi-strategy + rerank             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

The Agentic RAG Loop

Unlike single-pass RAG, agentic RAG iterates until it has sufficient information. The agent evaluates results and decides whether to continue searching.

The fundamental limitation of single-pass RAG: Traditional RAG does one retrieval pass: embed the query, find similar chunks, stuff them into a prompt, generate a response. This works for simple queries where the first retrieval attempt finds relevant information. But complex queries often require multiple attempts: the first search might not use the right terminology, might miss a relevant data source, or might return partial information that reveals what else to look for.

How agentic RAG differs: The agent treats retrieval as a tool it can invoke multiple times with different strategies. After each retrieval, the agent evaluates: "Do I have enough information to answer? Are there gaps? Should I search differently?" This self-evaluation loop transforms RAG from a single function call into an iterative research process.

The cost of iteration: Each RAG loop iteration adds latency (200-500ms for retrieval, 500-2000ms for LLM evaluation). We typically limit iterations to 3-5 before forcing a response with whatever information is available. The system should also track token costs across iterations and have budget limits to prevent runaway queries.

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         AGENTIC RAG LOOP                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ Query: "What specific factors drove Q3 decline?"                 │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  1. INITIAL SEARCH  │                                │
│                   │  Strategy: Hybrid   │                                │
│                   │  Sources: CRM, Wiki │                                │
│                   └─────────────────────┘                                │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  2. RETRIEVE        │                                │
│                   │  Results: 12 docs   │                                │
│                   │  Top score: 0.82    │                                │
│                   └─────────────────────┘                                │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  3. SELF-EVALUATE   │◄─────────────────────┐        │
│                   │                     │                      │        │
│                   │  □ Answers query?   │                      │        │
│                   │    → Partial        │                      │        │
│                   │  □ Sufficient       │                      │        │
│                   │    detail?          │                      │        │
│                   │    → No (missing    │                      │        │
│                   │      EMEA data)     │                      │        │
│                   │  □ Sources          │                      │        │
│                   │    reliable?        │                      │        │
│                   │    → Yes            │                      │        │
│                   └─────────────────────┘                      │        │
│                              │                                 │        │
│                         [INSUFFICIENT]                         │        │
│                              │                                 │        │
│                              ▼                                 │        │
│                   ┌─────────────────────┐                      │        │
│                   │  4. REFORMULATE     │                      │        │
│                   │                     │                      │        │
│                   │  New query:         │                      │        │
│                   │  "Q3 EMEA sales     │                      │        │
│                   │   performance       │                      │        │
│                   │   decline factors"  │                      │        │
│                   │                     │                      │        │
│                   │  Add sources:       │                      │        │
│                   │  + Regional reports │                      │        │
│                   └─────────────────────┘                      │        │
│                              │                                 │        │
│                              └─────────────────────────────────┘        │
│                                                                          │
│                      [After 2nd iteration: SUFFICIENT]                   │
│                              │                                           │
│                              ▼                                           │
│                   ┌─────────────────────┐                                │
│                   │  5. RETURN RESULTS  │                                │
│                   │  18 docs, 3 sources │                                │
│                   │  Confidence: 0.85   │                                │
│                   └─────────────────────┘                                │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Multi-Source Fusion

Real questions often require combining information from heterogeneous sources. The fusion engine handles deduplication, conflict resolution, and authority weighting:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                       MULTI-SOURCE FUSION ENGINE                         │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Query: "Q3 revenue performance"                                         │
│                                                                          │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐    │
│  │ Finance DB  │  │ CRM System  │  │ Sales Wiki  │  │ Exec Slides │    │
│  │             │  │             │  │             │  │             │    │
│  │ Revenue:    │  │ Revenue:    │  │ "Q3 was     │  │ Revenue:    │    │
│  │ $4.2M       │  │ $4.15M      │  │  challenging│  │ $4.2M       │    │
│  │ (precise)   │  │ (pipeline)  │  │  quarter"   │  │ (rounded)   │    │
│  │             │  │             │  │             │  │             │    │
│  │ Authority:  │  │ Authority:  │  │ Authority:  │  │ Authority:  │    │
│  │ CANONICAL   │  │ SECONDARY   │  │ CONTEXTUAL  │  │ SUMMARY     │    │
│  └─────────────┘  └─────────────┘  └─────────────┘  └─────────────┘    │
│         │               │               │               │              │
│         └───────────────┴───────────────┴───────────────┘              │
│                                   │                                     │
│                                   ▼                                     │
│                    ┌─────────────────────────┐                          │
│                    │     FUSION ENGINE       │                          │
│                    ├─────────────────────────┤                          │
│                    │                         │                          │
│                    │ 1. DEDUPLICATE          │                          │
│                    │    Same fact from       │                          │
│                    │    Finance + Exec       │                          │
│                    │    → Keep one, note     │                          │
│                    │      multiple sources   │                          │
│                    │                         │                          │
│                    │ 2. RESOLVE CONFLICTS    │                          │
│                    │    Finance: $4.2M       │                          │
│                    │    CRM: $4.15M          │                          │
│                    │    → Use Finance DB     │                          │
│                    │      (canonical source) │                          │
│                    │    → Note: CRM shows    │                          │
│                    │      pipeline, not      │                          │
│                    │      closed revenue     │                          │
│                    │                         │                          │
│                    │ 3. MERGE CONTEXT        │                          │
│                    │    Wiki adds context    │                          │
│                    │    about challenges     │                          │
│                    │    → Append as          │                          │
│                    │      qualitative note   │                          │
│                    │                         │                          │
│                    │ 4. CITATION CHAIN       │                          │
│                    │    Revenue claim:       │                          │
│                    │    [Finance DB, Q3      │                          │
│                    │     report, row 42]     │                          │
│                    │                         │                          │
│                    └─────────────────────────┘                          │
│                                   │                                     │
│                                   ▼                                     │
│                    ┌─────────────────────────┐                          │
│                    │    UNIFIED CONTEXT      │                          │
│                    │                         │                          │
│                    │  Revenue: $4.2M         │                          │
│                    │  Source: Finance DB     │                          │
│                    │  Confidence: HIGH       │                          │
│                    │  Corroborated: 2 srcs   │                          │
│                    │  Context: "Challenging  │                          │
│                    │  quarter" per Wiki      │                          │
│                    └─────────────────────────┘                          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 4: MCP Integration Layer

Model Context Protocol (MCP) provides a standardized way to give agents access to tools, data sources, and external capabilities. Instead of hardcoding tool integrations, MCP allows dynamic discovery and connection.

MCP Architecture

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                         MCP ARCHITECTURE                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │                      AGENT RUNTIME                               │    │
│  │                                                                  │    │
│  │  ┌────────────────────────────────────────────────────────┐     │    │
│  │  │                    MCP CLIENT                           │     │    │
│  │  │                                                         │     │    │
│  │  │  ┌──────────────┐  ┌──────────────┐  ┌──────────────┐  │     │    │
│  │  │  │ Connection   │  │    Tool      │  │   Resource   │  │     │    │
│  │  │  │ Manager      │  │   Router     │  │   Resolver   │  │     │    │
│  │  │  └──────────────┘  └──────────────┘  └──────────────┘  │     │    │
│  │  │                                                         │     │    │
│  │  └────────────────────────────────────────────────────────┘     │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                    │                                     │
│                     JSON-RPC / stdio / SSE                               │
│                                    │                                     │
│       ┌────────────────────────────┼────────────────────────────┐       │
│       ▼                            ▼                            ▼       │
│  ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐  │
│  │  MCP SERVER:     │    │  MCP SERVER:     │    │  MCP SERVER:     │  │
│  │  Filesystem      │    │  Database        │    │  Web Search      │  │
│  ├──────────────────┤    ├──────────────────┤    ├──────────────────┤  │
│  │                  │    │                  │    │                  │  │
│  │  TOOLS:          │    │  TOOLS:          │    │  TOOLS:          │  │
│  │  • read_file     │    │  • query_sql     │    │  • search_web    │  │
│  │  • write_file    │    │  • list_tables   │    │  • fetch_url     │  │
│  │  • list_dir      │    │  • get_schema    │    │  • extract_text  │  │
│  │  • search_files  │    │  • explain_query │    │  • search_news   │  │
│  │                  │    │                  │    │                  │  │
│  │  RESOURCES:      │    │  RESOURCES:      │    │  RESOURCES:      │  │
│  │  • file://docs/* │    │  • db://sales    │    │  • web://        │  │
│  │  • file://code/* │    │  • db://finance  │    │  • news://       │  │
│  │                  │    │                  │    │                  │  │
│  │  PROMPTS:        │    │  PROMPTS:        │    │  PROMPTS:        │  │
│  │  • summarize_doc │    │  • analyze_table │    │  • research_topic│  │
│  │                  │    │                  │    │                  │  │
│  └──────────────────┘    └──────────────────┘    └──────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Tool Discovery and Capability Matching

Agents discover available tools at runtime and match them to task requirements:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    CAPABILITY DISCOVERY FLOW                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  STEP 1: SERVER ADVERTISEMENT                                            │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Server: "enterprise-database"                                      │  │
│  │ Version: "1.2.0"                                                   │  │
│  │                                                                    │  │
│  │ Capabilities:                                                      │  │
│  │   tools: true                                                      │  │
│  │   resources: true                                                  │  │
│  │   prompts: true                                                    │  │
│  │                                                                    │  │
│  │ Tools:                                                             │  │
│  │   - name: "query_sql"                                              │  │
│  │     description: "Execute read-only SQL queries"                   │  │
│  │     inputSchema:                                                   │  │
│  │       query: string (required)                                     │  │
│  │       database: enum [sales, finance, hr]                          │  │
│  │       limit: integer (default: 100)                                │  │
│  │                                                                    │  │
│  │   - name: "get_schema"                                             │  │
│  │     description: "Get table schema and relationships"              │  │
│  │     inputSchema:                                                   │  │
│  │       table: string (required)                                     │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│                                    ▼                                     │
│  STEP 2: AGENT CAPABILITY MATCHING                                       │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Agent task: "Find Q3 revenue breakdown by region"                  │  │
│  │                                                                    │  │
│  │ Required capabilities:                                             │  │
│  │   • Query structured data       ──► matches: query_sql             │  │
│  │   • Understand data structure   ──► matches: get_schema            │  │
│  │   • Filter by time period       ──► query_sql supports WHERE       │  │
│  │   • Group by dimension          ──► query_sql supports GROUP BY    │  │
│  │                                                                    │  │
│  │ Tool selection: query_sql on db://finance                          │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│                                    ▼                                     │
│  STEP 3: TOOL INVOCATION                                                 │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ Call: query_sql                                                    │  │
│  │ Arguments:                                                         │  │
│  │   database: "finance"                                              │  │
│  │   query: "SELECT region, SUM(revenue) as total                     │  │
│  │           FROM quarterly_revenue                                   │  │
│  │           WHERE quarter = 'Q3' AND year = 2024                     │  │
│  │           GROUP BY region"                                         │  │
│  │   limit: 50                                                        │  │
│  │                                                                    │  │
│  │ Response:                                                          │  │
│  │   [                                                                │  │
│  │     {"region": "NA", "total": 2100000},                            │  │
│  │     {"region": "EMEA", "total": 1400000},                          │  │
│  │     {"region": "APAC", "total": 700000}                            │  │
│  │   ]                                                                │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Permission Scoping and Security

Different agents get different tool permissions based on their role and the task:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                     PERMISSION SCOPING MODEL                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                    PERMISSION MATRIX                               │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │                     │ Research │ Analyst │ Executor │ Critic     │  │
│  │ TOOL               │ Agent    │ Agent   │ Agent    │ Agent      │  │
│  ├─────────────────────┼──────────┼─────────┼──────────┼────────────┤  │
│  │ filesystem.read     │ ✓ scoped │ ✓ scoped│ ✓ full   │ ✓ scoped   │  │
│  │ filesystem.write    │ ✗        │ ✗       │ ✓ scoped │ ✗          │  │
│  │ database.select     │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ database.insert     │ ✗        │ ✗       │ ✓ +audit │ ✗          │  │
│  │ database.delete     │ ✗        │ ✗       │ ✗        │ ✗          │  │
│  │ web.search          │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ web.fetch           │ ✓        │ ✓       │ ✓        │ ✓          │  │
│  │ email.send          │ ✗        │ ✗       │ ✓ +human │ ✗          │  │
│  │ api.external        │ ✓ readonly│ ✓ readonly│ ✓ full│ ✓ readonly │  │
│  └─────────────────────┴──────────┴─────────┴──────────┴────────────┘  │
│                                                                          │
│  SCOPING RULES:                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ • "scoped" = limited to specific directories or tables             │  │
│  │ • "+audit" = all actions logged with user attribution             │  │
│  │ • "+human" = requires human approval before execution             │  │
│  │ • "readonly" = can query but not modify                           │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 5: Agent-to-Agent (A2A) Communication

Complex tasks require multiple specialized agents working together. A2A provides the protocol for agents to discover each other, delegate tasks, and share context.

A2A Architecture and Agent Registry

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                       A2A ARCHITECTURE                                   │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                      AGENT REGISTRY                                │  │
│  │                                                                    │  │
│  │  Stores agent cards describing capabilities, inputs, outputs       │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                    │                                     │
│            ┌───────────────────────┼───────────────────────┐            │
│            ▼                       ▼                       ▼            │
│  ┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐     │
│  │   AGENT CARD    │    │   AGENT CARD    │    │   AGENT CARD    │     │
│  ├─────────────────┤    ├─────────────────┤    ├─────────────────┤     │
│  │ Name: Research  │    │ Name: Analyst   │    │ Name: Competitor│     │
│  │       Agent     │    │       Agent     │    │       Intel     │     │
│  │                 │    │                 │    │                 │     │
│  │ Description:    │    │ Description:    │    │ Description:    │     │
│  │ Retrieves and   │    │ Synthesizes     │    │ Gathers market  │     │
│  │ summarizes info │    │ data into       │    │ intelligence on │     │
│  │ from internal   │    │ insights and    │    │ competitors     │     │
│  │ sources         │    │ recommendations │    │                 │     │
│  │                 │    │                 │    │                 │     │
│  │ Skills:         │    │ Skills:         │    │ Skills:         │     │
│  │ • RAG retrieval │    │ • Synthesis     │    │ • Web research  │     │
│  │ • Summarization │    │ • Reasoning     │    │ • News analysis │     │
│  │ • Citation      │    │ • Visualization │    │ • Comparison    │     │
│  │                 │    │                 │    │                 │     │
│  │ Input Schema:   │    │ Input Schema:   │    │ Input Schema:   │     │
│  │ • query: string │    │ • data: object  │    │ • companies:    │     │
│  │ • sources: list │    │ • task: string  │    │     string[]    │     │
│  │ • depth: enum   │    │ • format: enum  │    │ • aspects: list │     │
│  │                 │    │                 │    │                 │     │
│  │ Output Schema:  │    │ Output Schema:  │    │ Output Schema:  │     │
│  │ • findings: list│    │ • analysis: str │    │ • intel: object │     │
│  │ • sources: list │    │ • confidence:   │    │ • sources: list │     │
│  │ • confidence:   │    │     float       │    │ • freshness:    │     │
│  │     float       │    │ • citations:    │    │     datetime    │     │
│  │                 │    │     list        │    │                 │     │
│  └─────────────────┘    └─────────────────┘    └─────────────────┘     │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Communication Patterns

A2A supports several communication patterns depending on task requirements:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    A2A COMMUNICATION PATTERNS                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  PATTERN 1: DELEGATION (fire and wait)                                   │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Orchestrator                                                      │  │
│  │       │                                                            │  │
│  │       │──── TASK: "Research Q3 revenue" ────►  Research Agent      │  │
│  │       │                                              │             │  │
│  │       │                                              │ (working)   │  │
│  │       │                                              │             │  │
│  │       │◄─── RESULT: {findings, sources} ────────────┘             │  │
│  │       │                                                            │  │
│  │       ▼                                                            │  │
│  │  Continue with results                                             │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 2: CONSULTATION (quick question)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Analyst Agent (mid-task)                                          │  │
│  │       │                                                            │  │
│  │       │──── CONSULT: "Is 15% decline significant?" ───►           │  │
│  │       │                                              Domain Expert │  │
│  │       │◄─── OPINION: "Yes, 3x industry avg" ────────┘             │  │
│  │       │                                                            │  │
│  │       ▼                                                            │  │
│  │  Incorporate insight, continue task                                │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 3: PARALLEL FAN-OUT (concurrent)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │                      Orchestrator                                  │  │
│  │                           │                                        │  │
│  │         ┌─────────────────┼─────────────────┐                      │  │
│  │         │                 │                 │                      │  │
│  │         ▼                 ▼                 ▼                      │  │
│  │    ┌─────────┐      ┌─────────┐      ┌─────────┐                  │  │
│  │    │Research │      │Research │      │Compet-  │                  │  │
│  │    │Agent #1 │      │Agent #2 │      │itor    │                  │  │
│  │    │(Finance)│      │(CRM)    │      │Intel    │                  │  │
│  │    └─────────┘      └─────────┘      └─────────┘                  │  │
│  │         │                 │                 │                      │  │
│  │         └─────────────────┼─────────────────┘                      │  │
│  │                           │                                        │  │
│  │                           ▼                                        │  │
│  │                     Aggregator                                     │  │
│  │                    (merge results)                                 │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  PATTERN 4: PIPELINE (sequential handoff)                                │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Research ──data──► Analyst ──draft──► Critic ──feedback──►       │  │
│  │    Agent             Agent              Agent                      │  │
│  │                                           │                        │  │
│  │                                           ▼                        │  │
│  │                                      [if issues]                   │  │
│  │                                           │                        │  │
│  │                      Analyst ◄────────────┘                        │  │
│  │                       Agent                                        │  │
│  │                      (revise)                                      │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Context Handoff Structure

When agents hand off tasks, they share structured context to maintain coherence:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      CONTEXT HANDOFF STRUCTURE                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                       HANDOFF PAYLOAD                              │  │
│  ├───────────────────────────────────────────────────────────────────┤  │
│  │                                                                    │  │
│  │  TASK CONTEXT                                                      │  │
│  │  ├── original_query: "What caused Q3 decline vs competitors?"      │  │
│  │  ├── current_subtask: "Analyze root cause factors"                 │  │
│  │  ├── requesting_agent: "orchestrator"                              │  │
│  │  └── priority: "high"                                              │  │
│  │                                                                    │  │
│  │  ACCUMULATED STATE                                                 │  │
│  │  ├── completed_subtasks:                                           │  │
│  │  │   └── [✓] "Q3 performance: $4.2M, down 15%"                     │  │
│  │  ├── pending_subtasks:                                             │  │
│  │  │   ├── [→] "Root cause analysis" (current)                       │  │
│  │  │   └── [ ] "Competitor comparison"                               │  │
│  │  └── working_hypotheses:                                           │  │
│  │      ├── "Enterprise deal losses (CRM data suggests)"              │  │
│  │      └── "EMEA underperformance (to verify)"                       │  │
│  │                                                                    │  │
│  │  RELEVANT DATA (pre-fetched)                                       │  │
│  │  ├── finance_summary: {revenue: 4.2M, margin: 42%, qoq: -15%}      │  │
│  │  └── source_docs: [doc_id_1, doc_id_2, doc_id_3]                   │  │
│  │                                                                    │  │
│  │  CONSTRAINTS                                                       │  │
│  │  ├── time_scope: "Q3 2024"                                         │  │
│  │  ├── data_access: ["finance", "crm", "wiki"]                       │  │
│  │  ├── max_tokens: 8000                                              │  │
│  │  └── deadline: "2024-01-04T15:00:00Z"                              │  │
│  │                                                                    │  │
│  │  EXPECTED OUTPUT                                                   │  │
│  │  ├── format: "structured_findings"                                 │  │
│  │  ├── required_fields: ["factors", "evidence", "confidence"]        │  │
│  │  └── citation_required: true                                       │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 6: Advanced Reasoning Engine

Complex queries require more than single-pass reasoning. The reasoning engine provides multiple strategies optimized for different problem types.

Reasoning Strategy Selection

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                    REASONING STRATEGY SELECTOR                           │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ PROBLEM CHARACTERISTICS         │ OPTIMAL STRATEGY                 │  │
│  ├─────────────────────────────────┼─────────────────────────────────┤  │
│  │                                 │                                  │  │
│  │ Simple factual lookup           │ Direct retrieval                 │  │
│  │ "What was Q3 revenue?"          │ → Single RAG pass               │  │
│  │                                 │                                  │  │
│  │ Multi-step reasoning            │ Chain-of-Thought (CoT)           │  │
│  │ "Why did X cause Y?"            │ → Step-by-step reasoning        │  │
│  │                                 │                                  │  │
│  │ Exploration / uncertainty       │ Tree-of-Thought (ToT)            │  │
│  │ "What are possible causes?"     │ → Branch and evaluate paths     │  │
│  │                                 │                                  │  │
│  │ High-stakes decision            │ Self-consistency                 │  │
│  │ "Should we proceed with X?"     │ → Multiple paths, vote          │  │
│  │                                 │                                  │  │
│  │ Complex synthesis               │ Iterative refinement             │  │
│  │ "Comprehensive analysis of..."  │ → Generate, critique, improve   │  │
│  │                                 │                                  │  │
│  │ Planning / decomposition        │ Plan-then-execute                │  │
│  │ "How do we achieve X?"          │ → Create plan, execute steps    │  │
│  │                                 │                                  │  │
│  └─────────────────────────────────┴─────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Chain-of-Thought with Grounded Verification

For our Enterprise Knowledge Agent, we use CoT with verification against retrieved sources:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                   GROUNDED CHAIN-OF-THOUGHT                              │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Query: "Why did revenue decline more than competitors?"                 │
│                                                                          │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 1: ESTABLISH FACTS                                          │    │
│  │                                                                  │    │
│  │ Thought: "First, I need to quantify our decline and compare      │    │
│  │          it to the market."                                      │    │
│  │                                                                  │    │
│  │ Action: Retrieve from Finance DB + Industry reports              │    │
│  │                                                                  │    │
│  │ Observation:                                                     │    │
│  │   • Our decline: 15% QoQ [source: finance_db, confidence: HIGH]  │    │
│  │   • Industry avg: +3% QoQ [source: analyst_report, conf: MEDIUM] │    │
│  │   • Gap: 18 percentage points worse than market                  │    │
│  │                                                                  │    │
│  │ Verification: ✓ Numbers sourced, ✓ Sources authoritative         │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 2: INVESTIGATE CAUSES                                       │    │
│  │                                                                  │    │
│  │ Thought: "18pp gap is significant. I need to identify company-   │    │
│  │          specific factors that explain this divergence."         │    │
│  │                                                                  │    │
│  │ Action: Search CRM notes, support tickets, sales wiki            │    │
│  │                                                                  │    │
│  │ Observations:                                                    │    │
│  │   • Lost 3 enterprise deals to Acme Corp [source: CRM]           │    │
│  │   • EMEA renewal delays due to procurement [source: sales_wiki]  │    │
│  │   • Product gap: missing API feature [source: support_tickets]   │    │
│  │                                                                  │    │
│  │ Verification: ✓ Multiple corroborating sources                   │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                              │                                           │
│                              ▼                                           │
│  ┌─────────────────────────────────────────────────────────────────┐    │
│  │ STEP 3: SYNTHESIZE                                               │    │
│  │                                                                  │    │
│  │ Thought: "I can now construct a causal narrative linking         │    │
│  │          the evidence to the outcome."                           │    │
│  │                                                                  │    │
│  │ Synthesis:                                                       │    │
│  │   Primary factor: Competitive losses (3 deals = ~$1.2M impact)   │    │
│  │   Secondary: EMEA delays (timing, not loss)                      │    │
│  │   Contributing: Product gap (enabled competitor win)             │    │
│  │                                                                  │    │
│  │ Confidence calibration:                                          │    │
│  │   • Competitive losses: HIGH (direct CRM evidence)               │    │
│  │   • EMEA delays: MEDIUM (wiki mentions, unquantified)            │    │
│  │   • Product gap: MEDIUM (support tickets, correlation)           │    │
│  └─────────────────────────────────────────────────────────────────┘    │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Reflection and Self-Critique Loop

After generating an answer, the Critic Agent evaluates quality:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      SELF-CRITIQUE CHECKLIST                             │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Generated Response: [Analysis of Q3 decline factors]                    │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ COMPLETENESS                                                       │  │
│  │ □ Answers the original question?                    [✓ YES]        │  │
│  │ □ Addresses all sub-questions?                      [✓ 3/3]        │  │
│  │ □ Includes competitor comparison?                   [✓ YES]        │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ ACCURACY                                                           │  │
│  │ □ All claims have citations?                        [✓ 7/7]        │  │
│  │ □ Numbers verified against sources?                 [✓ YES]        │  │
│  │ □ No unsupported speculation?                       [⚠ 1 flag]     │  │
│  │   → "Product gap likely contributed" - needs evidence              │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LOGIC                                                              │  │
│  │ □ Reasoning chain is valid?                         [✓ YES]        │  │
│  │ □ No logical leaps?                                 [✓ YES]        │  │
│  │ □ Alternative explanations considered?              [⚠ PARTIAL]    │  │
│  │   → Could mention macroeconomic factors                            │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ CONFIDENCE CALIBRATION                                             │  │
│  │ □ Confidence levels appropriate?                    [✓ YES]        │  │
│  │ □ Uncertainties acknowledged?                       [✓ YES]        │  │
│  │ □ Limitations stated?                               [⚠ ADD]        │  │
│  │   → Should note EMEA data is limited                               │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  VERDICT: PASS with minor revisions                                      │
│  ACTIONS:                                                                │
│    1. Add evidence for product gap claim OR soften language             │
│    2. Add caveat about EMEA data limitations                            │
│    3. Optional: mention macro factors as alternative hypothesis         │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 7: Conversation and Context Management

Agents must maintain coherent context across multi-turn conversations and complex multi-step tasks. This requires a hierarchical approach to memory and intelligent context window management.

Context Hierarchy

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                       CONTEXT HIERARCHY                                  │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 1: SYSTEM CONTEXT (Persistent across all sessions)           │  │
│  │                                                                    │  │
│  │ • Agent identity, role, capabilities                               │  │
│  │ • Organization knowledge (policies, structure)                     │  │
│  │ • Tool permissions and access controls                             │  │
│  │ • User profile (role, preferences, history summary)                │  │
│  │                                                                    │  │
│  │ Token budget: ~2,000 tokens (always included)                      │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 2: SESSION CONTEXT (Per conversation)                        │  │
│  │                                                                    │  │
│  │ • Conversation history (summarized if long)                        │  │
│  │ • Current task state and progress                                  │  │
│  │ • Retrieved documents (this session)                               │  │
│  │ • Working hypotheses and intermediate findings                     │  │
│  │ • Agent handoff history                                            │  │
│  │                                                                    │  │
│  │ Token budget: ~20,000 tokens (managed dynamically)                 │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ LEVEL 3: TURN CONTEXT (Per interaction)                            │  │
│  │                                                                    │  │
│  │ • Current user query                                               │  │
│  │ • Immediate tool results                                           │  │
│  │ • Current reasoning trace                                          │  │
│  │ • Active sub-task context                                          │  │
│  │                                                                    │  │
│  │ Token budget: ~40,000 tokens (primary working space)               │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                              │                                           │
│                              ▼                                           │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ OUTPUT RESERVATION                                                 │  │
│  │                                                                    │  │
│  │ Reserved for model response generation                             │  │
│  │                                                                    │  │
│  │ Token budget: ~30,000 tokens (reserved)                            │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  TOTAL: ~92,000 tokens used of 128K context window                       │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Context Overflow Strategies

When context exceeds budget, the system applies progressive compression:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                   CONTEXT OVERFLOW MANAGEMENT                            │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Context Usage: 95,000 / 92,000 tokens (OVERFLOW)                        │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 1: CONVERSATION SUMMARIZATION                             │  │
│  │                                                                    │  │
│  │ Before: Full conversation history (25,000 tokens)                  │  │
│  │ After:  Summarized history (5,000 tokens)                          │  │
│  │                                                                    │  │
│  │ Summary includes:                                                  │  │
│  │ • Key topics discussed                                             │  │
│  │ • Decisions made                                                   │  │
│  │ • Important facts established                                      │  │
│  │ • Current task status                                              │  │
│  │                                                                    │  │
│  │ Savings: 20,000 tokens                                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 2: DOCUMENT COMPRESSION                                   │  │
│  │                                                                    │  │
│  │ Before: Full retrieved documents (40,000 tokens)                   │  │
│  │ After:  Key excerpts only (15,000 tokens)                          │  │
│  │                                                                    │  │
│  │ Keeps:                                                             │  │
│  │ • Sentences containing query keywords                              │  │
│  │ • Surrounding context (1 sentence before/after)                    │  │
│  │ • Document titles and metadata                                     │  │
│  │                                                                    │  │
│  │ Savings: 25,000 tokens                                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ STRATEGY 3: WORKING MEMORY OFFLOAD                                 │  │
│  │                                                                    │  │
│  │ Move to retrievable storage:                                       │  │
│  │ • Completed sub-task details (keep summaries)                      │  │
│  │ • Alternative hypotheses (keep top 2)                              │  │
│  │ • Verbose tool outputs (keep structured extracts)                  │  │
│  │                                                                    │  │
│  │ Can be retrieved if needed for follow-up questions                 │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  New Context Usage: 65,000 / 92,000 tokens (OK)                          │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 8: Production Patterns

Observability Architecture

Comprehensive observability is essential for debugging multi-agent systems:

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      OBSERVABILITY STACK                                 │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ DISTRIBUTED TRACES                                                 │  │
│  │                                                                    │  │
│  │ request_id: abc-123                                                │  │
│  │ user_id: user_456                                                  │  │
│  │ total_duration: 8.2s                                               │  │
│  │ total_tokens: 14,200                                               │  │
│  │                                                                    │  │
│  │ ├── gateway (45ms)                                                 │  │
│  │ │   ├── input_validation (5ms)                                     │  │
│  │ │   └── guardrail_check (40ms)                                     │  │
│  │ │                                                                  │  │
│  │ ├── orchestrator (120ms)                                           │  │
│  │ │   ├── query_analysis (80ms) [tokens: 500]                        │  │
│  │ │   └── agent_routing (40ms)                                       │  │
│  │ │                                                                  │  │
│  │ ├── research_agent (3.2s) [parallel]                               │  │
│  │ │   ├── rag_retrieval (800ms)                                      │  │
│  │ │   │   ├── vector_search (200ms) [results: 15]                    │  │
│  │ │   │   ├── keyword_search (150ms) [results: 8]                    │  │
│  │ │   │   └── reranking (100ms) [final: 10]                          │  │
│  │ │   └── llm_reasoning (2.2s) [tokens: 4,200]                       │  │
│  │ │                                                                  │  │
│  │ ├── competitor_agent (3.5s) [parallel]                             │  │
│  │ │   ├── web_search (1.8s) [results: 12]                            │  │
│  │ │   └── llm_analysis (1.5s) [tokens: 3,100]                        │  │
│  │ │                                                                  │  │
│  │ ├── synthesis (1.8s)                                               │  │
│  │ │   └── llm_synthesis (1.8s) [tokens: 5,400]                       │  │
│  │ │                                                                  │  │
│  │ └── critic_review (0.6s)                                           │  │
│  │     └── llm_critique (0.6s) [tokens: 1,000]                        │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │ KEY METRICS                                                        │  │
│  │                                                                    │  │
│  │ Latency:                        Quality:                           │  │
│  │ • p50: 5.2s                     • task_success_rate: 94%           │  │
│  │ • p95: 12.1s                    • rag_precision@5: 0.78            │  │
│  │ • p99: 18.3s                    • citation_coverage: 97%           │  │
│  │                                                                    │  │
│  │ Cost:                           Reliability:                       │  │
│  │ • tokens_per_request: 14,200    • error_rate: 2.1%                 │  │
│  │ • cost_per_request: $0.18      • retry_rate: 5.3%                  │  │
│  │ • cache_hit_rate: 23%           • timeout_rate: 0.8%               │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Cost Optimization

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      COST OPTIMIZATION STRATEGIES                        │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  1. MODEL ROUTING BY TASK COMPLEXITY                                     │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Task Type              │ Model        │ Cost/1K tokens            │  │
│  │  ───────────────────────┼──────────────┼─────────────────────────  │  │
│  │  Query classification   │ Haiku        │ $0.00025                  │  │
│  │  Simple retrieval       │ Haiku        │ $0.00025                  │  │
│  │  Guardrail checks       │ Haiku        │ $0.00025                  │  │
│  │  Synthesis              │ Sonnet       │ $0.003                    │  │
│  │  Complex reasoning      │ Opus         │ $0.015                    │  │
│  │  Critic review          │ Sonnet       │ $0.003                    │  │
│  │                                                                    │  │
│  │  Blended average: ~$0.004/1K tokens (vs $0.015 all-Opus)           │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  2. MULTI-LEVEL CACHING                                                  │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  Cache Layer          │ Hit Rate │ Latency Saved │ Cost Saved      │  │
│  │  ─────────────────────┼──────────┼───────────────┼────────────────  │  │
│  │  Exact query cache    │ 8%       │ 100%          │ 100%            │  │
│  │  Semantic query cache │ 15%      │ 80%           │ 80%             │  │
│  │  RAG result cache     │ 25%      │ 40%           │ 30%             │  │
│  │  Tool result cache    │ 35%      │ 20%           │ 15%             │  │
│  │                                                                    │  │
│  │  Combined savings: ~30% cost reduction                             │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
│  3. EARLY TERMINATION                                                    │
│  ┌───────────────────────────────────────────────────────────────────┐  │
│  │                                                                    │  │
│  │  • Simple queries: Skip multi-agent orchestration                  │  │
│  │  • High-confidence answers: Skip critic review                     │  │
│  │  • Cached results valid: Skip full pipeline                        │  │
│  │  • User abandons: Stop in-progress agents                          │  │
│  │                                                                    │  │
│  └───────────────────────────────────────────────────────────────────┘  │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Graceful Degradation

Code
┌─────────────────────────────────────────────────────────────────────────┐
│                      GRACEFUL DEGRADATION CHAIN                          │
├─────────────────────────────────────────────────────────────────────────┤
│                                                                          │
│  Primary Path: Full multi-agent pipeline                                 │
│                        │                                                 │
│                        ▼                                                 │
│              ┌──────────────────┐                                        │
│              │    SUCCESS?      │───Yes───► Return full response         │
│              └──────────────────┘                                        │
│                        │ No                                              │
│                        ▼                                                 │
│  Fallback 1: Retry with exponential backoff                              │
│              ┌──────────────────┐                                        │
│              │ Retry (3 attempts)│───Success───► Return response         │
│              └──────────────────┘                                        │
│                        │ Fail                                            │
│                        ▼                                                 │
│  Fallback 2: Simplified single-agent                                     │
│              ┌──────────────────┐                                        │
│              │ Single agent with│───Success───► Return (note: simplified)│
│              │ basic RAG        │                                        │
│              └──────────────────┘                                        │
│                        │ Fail                                            │
│                        ▼                                                 │
│  Fallback 3: Return partial results                                      │
│              ┌──────────────────┐                                        │
│              │ Any completed    │───Has data──► Return partial           │
│              │ sub-tasks?       │               (note: incomplete)       │
│              └──────────────────┘                                        │
│                        │ No data                                         │
│                        ▼                                                 │
│  Fallback 4: Graceful error message                                      │
│              ┌──────────────────────────────────────────────────────┐   │
│              │ "I wasn't able to complete this analysis.            │   │
│              │  What I tried:                                       │   │
│              │    • Searched internal finance database              │   │
│              │    • Queried CRM for deal information                │   │
│              │  What failed:                                        │   │
│              │    • Web search timed out                            │   │
│              │  Suggested next steps:                               │   │
│              │    • Try again in a few minutes                      │   │
│              │    • Ask a more specific question about internal data│   │
│              │    • Contact support if issue persists"              │   │
│              └──────────────────────────────────────────────────────┘   │
│                                                                          │
└─────────────────────────────────────────────────────────────────────────┘

Part 9: Implementation Roadmap

Building this architecture incrementally:

Phase 1: Foundation

  • Single agent with basic RAG (one vector store)
  • 2-3 hardcoded tools
  • Simple conversation history
  • Basic logging

Outcome: Working prototype for simple, single-domain queries

Phase 2: Intelligence

  • Query analysis and complexity classification
  • Multi-strategy RAG (vector + keyword + hybrid)
  • Agentic RAG loop (iterative retrieval)
  • Chain-of-thought reasoning

Outcome: Handles complex single-domain queries

Phase 3: Multi-Agent

  • Specialized agents (researcher, analyst, critic)
  • Agent routing and orchestration
  • A2A communication protocol
  • Context handoff between agents

Outcome: Handles complex multi-domain queries

Phase 4: Production

  • MCP integration for tools
  • Full observability (traces, metrics)
  • Caching layers
  • Cost optimization
  • Graceful degradation

Outcome: Production-ready system


Conclusion

Building a knowledge agent platform requires integrating multiple sophisticated components:

  1. Query Understanding classifies complexity and decomposes into sub-questions
  2. Multi-Strategy RAG retrieves intelligently with iterative refinement
  3. MCP provides standardized, dynamic tool access
  4. A2A enables agent collaboration and delegation
  5. Reasoning Engine handles multi-step thinking with verification
  6. Context Management maintains coherence across interactions
  7. Orchestration coordinates the entire flow

Start simple—a single agent with basic RAG delivers value immediately. Add components as your use cases demand. The architecture is modular so you can evolve incrementally.


Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles

EducationAgentic AI

Building Agentic AI Systems: A Complete Implementation Guide

A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.

30 min read
EducationRAG

Building Production-Ready RAG Systems: Lessons from the Field

A comprehensive guide to building Retrieval-Augmented Generation systems that actually work in production, based on real-world experience at Goji AI.

16 min read
Agentic AILLMs

Building MCP Servers: Custom Tool Integrations for AI Agents

A comprehensive guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.

18 min read
EducationAgentic AI

Building Customer Support Agents: A Production Architecture Guide

A comprehensive guide to building multi-agent customer support systems—triage routing, specialized agents, context handoffs, guardrails, and production patterns with full implementation examples.

13 min read
LLMsPersonalization

LLM Memory Systems: From MemGPT to Long-Term Agent Memory

Understanding memory architectures for LLM agents—MemGPT's hierarchical memory, Letta's agent framework, and patterns for building agents that learn and remember across conversations.

30 min read
Agentic AIResearch

Building Deep Research AI: From Query to Comprehensive Report

How to build AI systems that conduct thorough, multi-source research and produce comprehensive reports rivaling human analysts.

12 min read