Workflows vs Agents: A Practical Decision Framework
Not every AI system needs autonomous agents. Learn when to use deterministic workflows, when to deploy agents, and how to choose the right architecture for your use case—with decision frameworks, trade-off analysis, and real-world examples.
Table of Contents
The Most Common Mistake in AI Architecture
Teams building AI systems frequently make the same mistake: reaching for agents when a workflow would suffice.
The allure is understandable. Agents feel futuristic. They handle ambiguity. They adapt. But that flexibility comes with costs—higher latency, increased token usage, unpredictable behavior, and harder debugging.
The truth is simpler than the hype suggests:
Most production AI systems should be workflows. Agents are for when you genuinely don't know the steps in advance.
This guide provides a practical framework for choosing between workflows and agents—and shows you how to combine them effectively.
This post builds on concepts from Anthropic's excellent Building Effective Agents guide, extending them with cost analysis, migration strategies, and framework-specific guidance.
Defining the Terms
Before diving into trade-offs, let's establish clear definitions.
What is a Workflow?
A workflow is a predefined sequence of steps where the control flow is determined by your code, not the LLM.
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW │
│ │
│ Your code controls what happens next. The LLM executes steps. │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Step 1 │ ──→ │ Step 2 │ ──→ │ Step 3 │ ──→ │ Output │ │
│ │ (LLM) │ │ (Code) │ │ (LLM) │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Examples: │
│ • Extract data → Validate → Transform → Store │
│ • Classify intent → Route → Generate response │
│ • Summarize → Translate → Format │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key characteristics:
- Steps are known at design time
- Branching is explicit (if/else in your code)
- Predictable execution path
- Deterministic number of LLM calls
- Easy to test, debug, and monitor
What is an Agent?
An agent is a system where the LLM decides what to do next. It operates in a loop, choosing actions until it determines the task is complete.
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT │
│ │
│ The LLM controls what happens next. Your code provides tools. │
│ │
│ ┌──────────────┐ │
│ │ │ │
│ ▼ │ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Task │ ──→ │ LLM │ ──→ │ Tool │ │
│ │ Input │ │ Decides │ │ Execute │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │
│ │ ┌─────────┘ │
│ ▼ │ │
│ ┌─────────┐ │
│ │ Done? │ ──→ No ──→ (loop back) │
│ └─────────┘ │
│ │ │
│ Yes │
│ ▼ │
│ ┌─────────┐ │
│ │ Output │ │
│ └─────────┘ │
│ │
│ Examples: │
│ • Research a topic (search → read → search more → synthesize) │
│ • Debug code (read → hypothesize → test → fix → verify) │
│ • Plan a trip (unknown number of searches, bookings, comparisons) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Key characteristics:
- Steps are determined at runtime by the LLM
- Dynamic execution path
- Variable number of LLM calls (bounded by max iterations)
- Can handle novel situations
- Harder to test, debug, and predict costs
The Spectrum: It's Not Binary
The real world isn't a clean choice between "workflow" and "agent." There's a spectrum of patterns with increasing autonomy:
┌─────────────────────────────────────────────────────────────────────────┐
│ THE AUTONOMY SPECTRUM │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Less Autonomy More Autonomy│
│ More Control Less Control │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Single │ │ Chain │ │ Router │ │ State │ │ Agent │ │
│ │ Call │ │ │ │ │ │ Machine │ │ │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │ │ │ │ │
│ ▼ ▼ ▼ ▼ ▼ │
│ One prompt Sequential LLM picks LLM triggers LLM decides │
│ one output LLM calls which path transitions everything │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Predictable ◄─────────────────────────────────────────► Flexible │
│ Cheap ◄─────────────────────────────────────────► Expensive │
│ Fast ◄─────────────────────────────────────────► Slow │
│ Testable ◄─────────────────────────────────────────► Unpredictable │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Let's examine each pattern:
1. Single LLM Call
The simplest pattern. One prompt, one response.
┌─────────────────────────────────────────────────────────────────────────┐
│ SINGLE CALL │
│ │
│ Input ──────────────→ LLM ──────────────→ Output │
│ │
│ Use when: │
│ • Task is self-contained │
│ • No external data needed │
│ • Classification, simple generation, formatting │
│ │
│ Examples: │
│ • Sentiment analysis │
│ • Text summarization │
│ • Code explanation │
│ • Translation │
│ │
└─────────────────────────────────────────────────────────────────────────┘
2. Chain (Sequential Workflow)
Multiple LLM calls in a fixed sequence. Each step's output feeds the next.
┌─────────────────────────────────────────────────────────────────────────┐
│ CHAIN │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Extract │ ──→ │Validate │ ──→ │Summarize│ ──→ │ Format │ │
│ │ (LLM) │ │ (Code) │ │ (LLM) │ │ (LLM) │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Use when: │
│ • Steps are known and fixed │
│ • Each step has a clear input/output │
│ • Quality improves with decomposition │
│ │
│ Examples: │
│ • Document processing pipeline │
│ • Content generation with review │
│ • Multi-stage analysis │
│ │
└─────────────────────────────────────────────────────────────────────────┘
3. Router (Conditional Workflow)
LLM classifies the input, then deterministic code routes to the appropriate handler.
┌─────────────────────────────────────────────────────────────────────────┐
│ ROUTER │
│ │
│ ┌─────────────┐ │
│ │ Classify │ │
│ │ Intent │ │
│ │ (LLM) │ │
│ └─────────────┘ │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Billing │ │ Technical│ │ Sales │ │
│ │ Handler │ │ Handler │ │ Handler │ │
│ └──────────┘ └──────────┘ └──────────┘ │
│ │
│ Use when: │
│ • Different input types need different handling │
│ • You can enumerate the categories │
│ • Each path is relatively simple │
│ │
│ Examples: │
│ • Customer support routing │
│ • Multi-domain Q&A │
│ • Intent-based chatbots │
│ │
└─────────────────────────────────────────────────────────────────────────┘
4. State Machine (Complex Workflow)
LLM can trigger transitions between states, but the states and valid transitions are predefined.
┌─────────────────────────────────────────────────────────────────────────┐
│ STATE MACHINE │
│ │
│ ┌─────────────────┐ │
│ │ START │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ ┌──────────│ GATHERING │◄─────────┐ │
│ │ │ INFORMATION │ │ │
│ │ └────────┬────────┘ │ │
│ │ │ │ │
│ │ need more info │ have enough │ clarification │
│ │ ▼ │ │
│ │ ┌─────────────────┐ │ │
│ └─────────►│ PROCESSING │──────────┘ │
│ └────────┬────────┘ │
│ │ │
│ ┌─────────────┼─────────────┐ │
│ ▼ │ ▼ │
│ ┌─────────────┐ │ ┌─────────────┐ │
│ │ SUCCESS │ │ │ FAILURE │ │
│ └─────────────┘ │ └─────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ HUMAN_ESCALATION│ │
│ └─────────────────┘ │
│ │
│ Use when: │
│ • Process has well-defined stages │
│ • Transitions depend on LLM judgment │
│ • You need auditability of state changes │
│ │
│ Examples: │
│ • Order processing bots │
│ • Onboarding flows │
│ • Approval workflows │
│ │
└─────────────────────────────────────────────────────────────────────────┘
5. Agent (Full Autonomy)
LLM decides what to do, executes tools, observes results, and repeats until done.
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ AGENT LOOP │ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ OBSERVE │◄───────────────┐ │ │
│ │ │ (get context)│ │ │ │
│ │ └──────┬───────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌──────────────┐ │ │ │
│ │ │ THINK │ │ │ │
│ │ │ (reason) │ │ │ │
│ │ └──────┬───────┘ │ │ │
│ │ │ │ │ │
│ │ ▼ │ │ │
│ │ ┌──────────────┐ ┌──────┴───────┐ │ │
│ │ │ ACT │────────►│ UPDATE │ │ │
│ │ │ (use tools) │ │ (memory) │ │ │
│ │ └──────┬───────┘ └──────────────┘ │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ ┌──────────────┐ │ │
│ │ │ DONE? │────► Yes ────► Output │ │
│ │ └──────────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Available Tools: │
│ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │ Search │ │ Read │ │ Write │ │Execute │ │ API │ │
│ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ │
│ │
│ Use when: │
│ • You don't know the steps in advance │
│ • Task requires exploration and adaptation │
│ • Multiple tools may be needed in varying order │
│ │
│ Examples: │
│ • Research tasks │
│ • Debugging and coding │
│ • Open-ended problem solving │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Decision Framework
Here's a practical framework for choosing the right pattern:
The Core Question
Ask yourself:
"Do I know the steps needed to complete this task?"
┌─────────────────────────────────────────────────────────────────────────┐
│ DECISION TREE │
│ │
│ Do you know the steps? │
│ │ │
│ ┌────────────┴────────────┐ │
│ ▼ ▼ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ │
│ │ WORKFLOW │ │ AGENT │ │
│ └────────┬────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ Are there multiple paths? │
│ │ │
│ ┌────────┴────────┐ │
│ ▼ ▼ │
│ YES NO │
│ │ │ │
│ ▼ ▼ │
│ ROUTER CHAIN │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Detailed Decision Matrix
| Factor | Prefer Workflow | Prefer Agent |
|---|---|---|
| Steps known? | Yes, I can enumerate them | No, depends on input/context |
| Predictability needed? | High (compliance, SLAs) | Low (best effort OK) |
| Cost sensitivity | High (pay per token matters) | Low (quality over cost) |
| Latency requirements | Strict (< 2 seconds) | Flexible (30+ seconds OK) |
| Failure tolerance | Low (must succeed) | High (can retry/escalate) |
| Debugging needs | High (need to trace issues) | Low (output matters most) |
| Task complexity | Decomposable into steps | Requires exploration |
| Domain | Narrow and well-defined | Broad or open-ended |
The Workflow Suitability Score
Score your use case (1-5 for each factor, higher = more suitable for workflows):
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW SUITABILITY SCORECARD │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Factor Score (1-5) │
│ ───────────────────────────────────────────────────────── │
│ │
│ Predictability of steps [ ] │
│ (5 = always same steps, 1 = varies wildly) │
│ │
│ Latency sensitivity [ ] │
│ (5 = must be fast, 1 = can wait minutes) │
│ │
│ Cost sensitivity [ ] │
│ (5 = every token counts, 1 = cost no object) │
│ │
│ Compliance/audit requirements [ ] │
│ (5 = heavily regulated, 1 = internal tool) │
│ │
│ Debuggability needs [ ] │
│ (5 = must trace every decision, 1 = just works) │
│ │
│ ───────────────────────────────────────────────────────── │
│ │
│ TOTAL: _____ / 25 │
│ │
│ 20-25: Strong workflow candidate │
│ 12-19: Consider hybrid (workflow with agent escape hatches) │
│ 5-11: Agent may be appropriate │
│ │
└─────────────────────────────────────────────────────────────────────────┘
When to Use Workflows
Workflows shine in scenarios where predictability matters more than flexibility.
Ideal Workflow Use Cases
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW SWEET SPOTS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ DOCUMENT PROCESSING │ │
│ │ │ │
│ │ Extract → Validate → Classify → Transform → Store │ │
│ │ │ │
│ │ Why workflow: │ │
│ │ • Same steps every time │ │
│ │ • Must process thousands of documents │ │
│ │ • Errors need to be traceable │ │
│ │ • Cost per document matters │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CUSTOMER SUPPORT ROUTING │ │
│ │ │ │
│ │ Classify Intent → Route to Handler → Generate Response │ │
│ │ │ │
│ │ Why workflow: │ │
│ │ • Finite number of intents (billing, technical, sales) │ │
│ │ • Each handler is specialized and tested │ │
│ │ • SLAs require predictable response times │ │
│ │ • Need to track which handler resolved which issues │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CONTENT GENERATION PIPELINE │ │
│ │ │ │
│ │ Research → Outline → Draft → Review → Edit → Publish │ │
│ │ │ │
│ │ Why workflow: │ │
│ │ • Quality improves with dedicated steps │ │
│ │ • Human review can be inserted at known points │ │
│ │ • Each step can use specialized prompts/models │ │
│ │ • Progress is measurable │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ DATA TRANSFORMATION │ │
│ │ │ │
│ │ Parse → Clean → Enrich → Validate → Output │ │
│ │ │ │
│ │ Why workflow: │ │
│ │ • Transformations are deterministic │ │
│ │ • Schema validation catches errors early │ │
│ │ • Batch processing at scale │ │
│ │ • Reproducible results │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Workflow Benefits
┌─────────────────────────────────────────────────────────────────────────┐
│ WHY WORKFLOWS WIN │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ PREDICTABLE COSTS │
│ ──────────────── │
│ • Fixed number of LLM calls per execution │
│ • Can calculate exact cost per request │
│ • No runaway token consumption │
│ │
│ Agent: 3-50 LLM calls (unpredictable) │
│ Workflow: 4 LLM calls (always) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ CONSISTENT LATENCY │
│ ───────────────── │
│ • Steps run in known order │
│ • Can parallelize independent steps │
│ • Meet SLAs reliably │
│ │
│ Agent: 2-60 seconds (variable) │
│ Workflow: 3-4 seconds (consistent) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ DEBUGGABILITY │
│ ──────────── │
│ • Each step has clear input/output │
│ • Failures localized to specific step │
│ • Can replay individual steps │
│ • Audit trail is straightforward │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ TESTABILITY │
│ ────────── │
│ • Unit test each step independently │
│ • Mock LLM responses for deterministic tests │
│ • Integration test the pipeline │
│ • Regression testing is meaningful │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ OPTIMIZABILITY │
│ ───────────── │
│ • Profile each step independently │
│ • Use smaller/faster models for simple steps │
│ • Cache intermediate results │
│ • Parallelize where possible │
│ │
└─────────────────────────────────────────────────────────────────────────┘
When to Use Agents
Agents excel when the path to a solution isn't known in advance.
Ideal Agent Use Cases
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT SWEET SPOTS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ RESEARCH TASKS │ │
│ │ │ │
│ │ "What are the key factors affecting renewable energy adoption │ │
│ │ in Southeast Asia?" │ │
│ │ │ │
│ │ Why agent: │ │
│ │ • Don't know what sources will be relevant │ │
│ │ • May need to follow unexpected leads │ │
│ │ • Depth of research depends on what's found │ │
│ │ • Quality requires iterative refinement │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ CODING & DEBUGGING │ │
│ │ │ │
│ │ "Fix the authentication bug in our login flow" │ │
│ │ │ │
│ │ Why agent: │ │
│ │ • Need to explore codebase to find relevant files │ │
│ │ • Debugging requires hypothesis-test cycles │ │
│ │ • Solution may require changes in multiple files │ │
│ │ • Must verify fix actually works │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ COMPLEX CUSTOMER SUPPORT │ │
│ │ │ │
│ │ "My order was charged twice and I need a refund but the │ │
│ │ original payment method expired" │ │
│ │ │ │
│ │ Why agent: │ │
│ │ • Multiple systems to query (orders, payments, customer) │ │
│ │ • Resolution path depends on what's found │ │
│ │ • May need clarification from customer │ │
│ │ • Edge cases require judgment │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ DATA ANALYSIS & EXPLORATION │ │
│ │ │ │
│ │ "Analyze our sales data and find interesting patterns" │ │
│ │ │ │
│ │ Why agent: │ │
│ │ • "Interesting" is subjective and discovered │ │
│ │ • May need to try multiple analysis approaches │ │
│ │ • Follow-up analysis depends on initial findings │ │
│ │ • Visualization choices depend on data shape │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Agent Advantage: Handling the Unknown
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT FLEXIBILITY │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Workflow for "fix bug": Agent for "fix bug": │
│ │
│ ┌──────────────────────┐ ┌──────────────────────┐ │
│ │ 1. Read error log │ │ 1. Read error log │ │
│ │ 2. Find file │ │ 2. Search codebase │ │
│ │ 3. Apply fix pattern │ │ (found 3 related │ │
│ │ 4. Test │ │ files) │ │
│ │ 5. Done │ │ 3. Read main file │ │
│ └──────────────────────┘ │ 4. Hypothesis: auth │ │
│ │ token expired │ │
│ What if bug is in │ 5. Check token logic │ │
│ unexpected location? │ 6. Found real issue: │ │
│ What if error log │ race condition │ │
│ is misleading? │ 7. Read related file │ │
│ What if fix requires │ 8. Design fix │ │
│ architectural change? │ 9. Apply fix │ │
│ │ 10. Test - fails │ │
│ ❌ Workflow fails │ 11. Investigate more │ │
│ │ 12. Fix edge case │ │
│ │ 13. Test - passes │ │
│ │ 14. Done │ │
│ └──────────────────────┘ │
│ │
│ ✅ Agent succeeds │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Agent Costs to Consider
Agents aren't free. Here's what you're trading for flexibility:
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT TRADE-OFFS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ TOKEN COST │
│ ────────── │
│ • Each loop iteration includes full context │
│ • Tool results add to context each step │
│ • 10 iterations × 4K context = 40K+ tokens per request │
│ │
│ Typical cost comparison: │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Pattern │ Tokens/Request │ Cost (GPT-4) │ Cost (Claude) │ │
│ ├───────────────┼────────────────┼──────────────┼─────────────────│ │
│ │ Single call │ 2,000 │ $0.06 │ $0.04 │ │
│ │ 3-step chain │ 6,000 │ $0.18 │ $0.12 │ │
│ │ Agent (avg) │ 35,000 │ $1.05 │ $0.70 │ │
│ │ Agent (max) │ 100,000 │ $3.00 │ $2.00 │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ LATENCY │
│ ─────── │
│ • Each LLM call adds 1-5 seconds │
│ • Tool execution adds variable time │
│ • Cannot parallelize dependent steps │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Pattern │ Min Latency │ Typical │ Max Latency │ │
│ ├───────────────┼─────────────┼─────────┼──────────────────────────│ │
│ │ Single call │ 0.5s │ 1.5s │ 3s │ │
│ │ 3-step chain │ 1.5s │ 4.5s │ 10s │ │
│ │ Agent │ 3s │ 20s │ 120s+ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PREDICTABILITY │
│ ────────────── │
│ • Same input may produce different outputs │
│ • Execution path varies by run │
│ • Harder to guarantee SLAs │
│ • Testing requires statistical approaches │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ DEBUGGING │
│ ───────── │
│ • Long traces to analyze │
│ • Non-deterministic reproduction │
│ • "Why did it do that?" often unclear │
│ • Requires sophisticated observability │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Hybrid Patterns: The Best of Both Worlds
In practice, the best systems combine workflows and agents strategically.
Pattern 1: Workflow with Agent Escape Hatch
Start with a workflow, but allow escalation to an agent for edge cases.
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW WITH AGENT ESCAPE HATCH │
│ │
│ ┌─────────────────┐ │
│ │ User Input │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ WORKFLOW LAYER │ │
│ │ │ │
│ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ │
│ │ │Classify │ ──→ │ Route │ ──→ │ Handle │ │ │
│ │ └─────────┘ └────┬────┘ └────┬────┘ │ │
│ │ │ │ │ │
│ │ ▼ ▼ │ │
│ │ Can handle? Confident? │ │
│ │ │ │ │ │
│ │ ┌────────┴────┐ ┌─────┴─────┐ │ │
│ │ ▼ ▼ ▼ ▼ │ │
│ │ Yes No Yes No │ │
│ │ │ │ │ │ │ │
│ │ ▼ │ ▼ │ │ │
│ │ ┌────────┐ │ Output │ │ │
│ │ │Response│ │ │ │ │
│ │ └────────┘ │ │ │ │
│ │ │ │ │ │
│ └────────────────────────────┼───────────────┼───────────────────────┘ │
│ │ │ │
│ └───────┬───────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ AGENT LAYER │ │
│ │ │ │
│ │ ┌─────────────────────────────┐ │ │
│ │ │ Full Agent Loop │ │ │
│ │ │ (tools, reasoning, │ │ │
│ │ │ multi-step resolution) │ │ │
│ │ └─────────────────────────────┘ │ │
│ │ │ │
│ └────────────────────────────────────────────────────────────────────┘ │
│ │
│ Benefits: │
│ • 90% of requests handled by fast, cheap workflow │
│ • 10% of complex cases get full agent treatment │
│ • Clear escalation path │
│ • Cost optimization without sacrificing capability │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Pattern 2: Agent as Workflow Orchestrator
Use an agent to coordinate multiple specialized workflows.
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT AS WORKFLOW ORCHESTRATOR │
│ │
│ ┌─────────────────┐ │
│ │ Complex Task │ │
│ │ │ │
│ │ "Analyze Q3 │ │
│ │ performance │ │
│ │ and recommend │ │
│ │ improvements" │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────────────┐ │
│ │ ORCHESTRATOR AGENT │ │
│ │ │ │
│ │ The agent decides WHICH workflows to run and in WHAT ORDER │ │
│ │ │ │
│ │ Current plan: │ │
│ │ 1. Run financial analysis workflow ✓ │ │
│ │ 2. Run competitor analysis workflow ✓ │ │
│ │ 3. Run recommendation workflow (in progress) │ │
│ │ │ │
│ └─────────────────────┬───────────────────────────────────────────────┘ │
│ │ │
│ ┌────────────┼────────────┐ │
│ ▼ ▼ ▼ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Financial │ │ Competitor │ │Recommendation│ │
│ │ Analysis │ │ Analysis │ │ Generation │ │
│ │ WORKFLOW │ │ WORKFLOW │ │ WORKFLOW │ │
│ │ │ │ │ │ │ │
│ │ Extract → │ │ Search → │ │ Synthesize → │ │
│ │ Calculate → │ │ Compare → │ │ Prioritize → │ │
│ │ Summarize │ │ Summarize │ │ Format │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Benefits: │
│ • Agent flexibility for high-level planning │
│ • Workflow efficiency for individual tasks │
│ • Each workflow is tested and optimized independently │
│ • Can add new workflows without changing agent │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Pattern 3: Workflow with Embedded Agent Steps
A workflow where specific steps are handled by agents.
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW WITH EMBEDDED AGENT STEPS │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Parse │ ──→ │Research │ ──→ │Validate │ ──→ │ Format │ │
│ │ Input │ │ (AGENT) │ │ Data │ │ Output │ │
│ │ │ │ │ │ │ │ │ │
│ │ [Code] │ │ [Agent] │ │ [Code] │ │ [LLM] │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Agent Loop │ │
│ │ │ │
│ │ • Search web │ │
│ │ • Read sources │ │
│ │ • Verify facts │ │
│ │ • Compile notes │ │
│ │ │ │
│ │ (bounded: max │ │
│ │ 5 iterations) │ │
│ └─────────────────┘ │
│ │
│ Benefits: │
│ • Known workflow structure │
│ • Agent used only where needed (research step) │
│ • Bounded agent (max iterations) │
│ • Rest of pipeline is deterministic │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Pattern 4: Tiered Complexity
Route requests to increasingly capable (and expensive) handlers.
┌─────────────────────────────────────────────────────────────────────────┐
│ TIERED COMPLEXITY │
│ │
│ ┌─────────────────┐ │
│ │ User Query │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Classify │ │
│ │ Complexity │ │
│ └────────┬────────┘ │
│ │ │
│ ┌────────────────┼────────────────┐ │
│ ▼ ▼ ▼ │
│ │
│ ┌────────────────┐ ┌────────────────┐ ┌────────────────┐ │
│ │ TIER 1 │ │ TIER 2 │ │ TIER 3 │ │
│ │ Simple │ │ Standard │ │ Complex │ │
│ │ │ │ │ │ │ │
│ │ Single LLM │ │ 3-step │ │ Full Agent │ │
│ │ call │ │ workflow │ │ with tools │ │
│ │ │ │ │ │ │ │
│ │ ~$0.01 │ │ ~$0.05 │ │ ~$0.50 │ │
│ │ ~1 second │ │ ~4 seconds │ │ ~30 seconds │ │
│ │ │ │ │ │ │ │
│ │ "What's your │ │ "Help me │ │ "Debug this │ │
│ │ return │ │ write an │ │ production │ │
│ │ policy?" │ │ email to..." │ │ issue..." │ │
│ └────────────────┘ └────────────────┘ └────────────────┘ │
│ │ │ │ │
│ └────────────────────┼────────────────────┘ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Response │ │
│ └─────────────────┘ │
│ │
│ Traffic distribution (typical): │
│ • Tier 1: 70% of requests (cheap and fast) │
│ • Tier 2: 25% of requests (moderate cost) │
│ • Tier 3: 5% of requests (expensive but necessary) │
│ │
│ Overall cost: Much lower than treating everything as Tier 3 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Common Anti-Patterns
Anti-Pattern 1: Over-Agentification
Using an agent when a simple workflow would suffice.
┌─────────────────────────────────────────────────────────────────────────┐
│ OVER-AGENTIFICATION │
│ │
│ ❌ BAD: Agent for email classification │
│ │
│ Agent loop: │
│ 1. Read email │
│ 2. Think about category │
│ 3. Maybe search for similar emails? │
│ 4. Think more │
│ 5. Decide category │
│ 6. Double-check decision │
│ 7. Done │
│ │
│ Cost: ~15,000 tokens, ~8 seconds │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ✅ GOOD: Single LLM call with structured output │
│ │
│ Prompt: "Classify this email into: billing, support, sales, spam" │
│ Output: { "category": "support", "confidence": 0.94 } │
│ │
│ Cost: ~500 tokens, ~1 second │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Signs you're over-agentifying: │
│ • Agent always takes the same steps │
│ • Agent rarely uses more than 1-2 tools │
│ • Task has a clear, predictable structure │
│ • You could write down the steps in advance │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Anti-Pattern 2: Under-Tooled Workflows
Workflows that should use tools but rely entirely on LLM knowledge.
┌─────────────────────────────────────────────────────────────────────────┐
│ UNDER-TOOLED WORKFLOWS │
│ │
│ ❌ BAD: Generate report from LLM knowledge only │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Ask LLM │ ──→ │ Ask LLM │ ──→ │ Ask LLM │ │
│ │ stats │ │ trends │ │ format │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │
│ Problem: LLM knowledge is stale, may hallucinate │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ✅ GOOD: Workflow with tool calls for fresh data │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Query │ ──→ │ Query │ ──→ │ LLM │ ──→ │ Format │ │
│ │ DB │ │ API │ │ Analyze │ │ Output │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
│ [Tool] [Tool] [LLM] [Code] │
│ │
│ Solution: Ground LLM in real data via tools │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Signs you're under-tooling: │
│ • Workflow produces outdated information │
│ • Hallucinations in factual claims │
│ • No connection to real data sources │
│ • Could be improved with retrieval or APIs │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Anti-Pattern 3: Unbounded Agents
Agents without proper limits that can spiral out of control.
┌─────────────────────────────────────────────────────────────────────────┐
│ UNBOUNDED AGENTS │
│ │
│ ❌ BAD: Agent with no limits │
│ │
│ • No max iterations → Can loop forever │
│ • No token budget → Can consume unlimited tokens │
│ • No timeout → Can run for hours │
│ • No tool restrictions → Can call expensive APIs repeatedly │
│ │
│ Real incident: Agent spent $200 in API calls trying to │
│ "thoroughly research" a simple question │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ✅ GOOD: Agent with proper guardrails │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ BOUNDED AGENT │ │
│ │ │ │
│ │ Limits: │ │
│ │ ├── max_iterations: 15 │ │
│ │ ├── max_tokens: 50,000 │ │
│ │ ├── timeout: 120 seconds │ │
│ │ ├── max_tool_calls: 20 │ │
│ │ └── budget: $1.00 per request │ │
│ │ │ │
│ │ On limit reached: │ │
│ │ ├── Gracefully summarize progress │ │
│ │ ├── Return partial results │ │
│ │ └── Log for monitoring │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ Always bound: iterations, tokens, time, cost │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Anti-Pattern 4: Premature Agent Optimization
Optimizing agent performance before validating the use case.
┌─────────────────────────────────────────────────────────────────────────┐
│ PREMATURE AGENT OPTIMIZATION │
│ │
│ ❌ BAD: Building sophisticated agent infrastructure before │
│ proving you need agents at all │
│ │
│ Week 1: "Let's build a multi-agent system!" │
│ Week 4: Complex agent framework with 10 agent types │
│ Week 8: Realize 90% of use cases are simple classifications │
│ Week 12: Rewrite everything as workflows │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ✅ GOOD: Iterative complexity │
│ │
│ 1. Start with simplest solution (single LLM call) │
│ │ │
│ ▼ │
│ 2. Identify failure cases │
│ │ │
│ ▼ │
│ 3. Add complexity only where needed │
│ │ │
│ ├──→ Most cases: Stay simple │
│ │ │
│ └──→ Edge cases: Add workflow steps or agent │
│ │
│ Principle: Earn complexity through proven need │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Real-World Architecture Examples
Example 1: Customer Support System
┌─────────────────────────────────────────────────────────────────────────┐
│ CUSTOMER SUPPORT ARCHITECTURE │
│ │
│ ┌─────────────────┐ │
│ │ Customer Query │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ ROUTER (Workflow) ││
│ │ ││
│ │ Intent Classification → Complexity Assessment → Route ││
│ │ ││
│ └───────────────────────────┬─────────────────────────────────────────┘│
│ │ │
│ ┌────────────────────┼────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ FAQ │ │ SIMPLE │ │ COMPLEX │ │
│ │ (Workflow) │ │ (Workflow) │ │ (Agent) │ │
│ │ │ │ │ │ │ │
│ │ RAG lookup │ │ Template + │ │ Full agent │ │
│ │ + format │ │ DB lookup │ │ with tools │ │
│ │ │ │ │ │ │ │
│ │ "What's │ │ "Check my │ │ "My order │ │
│ │ your │ │ order │ │ shipped to │ │
│ │ return │ │ status" │ │ wrong │ │
│ │ policy?" │ │ │ │ address │ │
│ │ │ │ │ │ and I need │ │
│ │ 60% of │ │ 30% of │ │ it │ │
│ │ queries │ │ queries │ │ redirected"│ │
│ │ │ │ │ │ │ │
│ │ ~1 second │ │ ~3 seconds │ │ 10% of │ │
│ │ ~$0.01 │ │ ~$0.05 │ │ queries │ │
│ └─────────────┘ └─────────────┘ │ │ │
│ │ ~30 seconds │ │
│ │ ~$0.50 │ │
│ └─────────────┘ │
│ │
│ Blended cost: ~$0.05 average (vs $0.50 if all were agents) │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Example 2: Document Processing Pipeline
┌─────────────────────────────────────────────────────────────────────────┐
│ DOCUMENT PROCESSING PIPELINE │
│ │
│ This is almost entirely workflows—agents are overkill here. │
│ │
│ ┌─────────────────┐ │
│ │ Document │ │
│ │ Upload │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ INGESTION WORKFLOW ││
│ │ ││
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ││
│ │ │ Parse │ → │ Clean │ → │ Chunk │ → │ Embed │ → │ Store │ ││
│ │ │ [Code] │ │ [Code] │ │ [Code] │ │ [API] │ │ [DB] │ ││
│ │ └────────┘ └────────┘ └────────┘ └────────┘ └────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ EXTRACTION WORKFLOW ││
│ │ ││
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ││
│ │ │Identify│ → │Extract │ → │Validate│ → │ Store │ ││
│ │ │ Fields │ │ Values │ │ Schema │ │ Data │ ││
│ │ │ [LLM] │ │ [LLM] │ │ [Code] │ │ [DB] │ ││
│ │ └────────┘ └────────┘ └────────┘ └────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ QUERY WORKFLOW ││
│ │ ││
│ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ││
│ │ │Rewrite │ → │Retrieve│ → │ Rank │ → │Generate│ ││
│ │ │ Query │ │ Docs │ │Results │ │ Answer │ ││
│ │ │ [LLM] │ │ [DB] │ │ [LLM] │ │ [LLM] │ ││
│ │ └────────┘ └────────┘ └────────┘ └────────┘ ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ │
│ Note: All workflows, no agents. Predictable, testable, fast. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Example 3: Coding Assistant
┌─────────────────────────────────────────────────────────────────────────┐
│ CODING ASSISTANT ARCHITECTURE │
│ │
│ This legitimately needs agents—coding is exploratory. │
│ │
│ ┌─────────────────┐ │
│ │ User Request │ │
│ │ │ │
│ │ "Fix the bug │ │
│ │ in user auth" │ │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ TRIAGE (Workflow) ││
│ │ ││
│ │ Classify request → Check permissions → Route ││
│ │ ││
│ │ Simple explanation? → Direct LLM response ││
│ │ Code generation? → Coding agent ││
│ │ Bug fix? → Debug agent ││
│ │ ││
│ └───────────────────────────┬─────────────────────────────────────────┘│
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────────────────────────┐│
│ │ DEBUG AGENT ││
│ │ ││
│ │ ┌─────────────────────────────────────────────────────────────┐ ││
│ │ │ AGENT LOOP │ ││
│ │ │ │ ││
│ │ │ Tools available: │ ││
│ │ │ • read_file • search_code • run_tests │ ││
│ │ │ • edit_file • search_docs • run_linter │ ││
│ │ │ • list_files • git_history • execute_code │ ││
│ │ │ │ ││
│ │ │ Agent trace: │ ││
│ │ │ 1. search_code("auth", "login") → 3 files found │ ││
│ │ │ 2. read_file("auth/login.py") → found suspicious code │ ││
│ │ │ 3. read_file("auth/tokens.py") → found related bug │ ││
│ │ │ 4. git_history("auth/") → recent change introduced bug │ ││
│ │ │ 5. edit_file("auth/login.py", fix) → applied fix │ ││
│ │ │ 6. run_tests("auth/") → 2 tests fail │ ││
│ │ │ 7. read_file("tests/test_auth.py") → understand failures │ ││
│ │ │ 8. edit_file("auth/login.py", better_fix) → refined fix │ ││
│ │ │ 9. run_tests("auth/") → all pass │ ││
│ │ │ 10. Done │ ││
│ │ │ │ ││
│ │ └─────────────────────────────────────────────────────────────┘ ││
│ │ ││
│ │ This REQUIRES an agent—steps depend on what's found. ││
│ │ ││
│ └─────────────────────────────────────────────────────────────────────┘│
│ │
└─────────────────────────────────────────────────────────────────────────┘
Cost Economics: The Numbers That Matter
Understanding the real costs helps you make informed decisions. Here's what each pattern actually costs at scale.
Cost Per Request Comparison
┌─────────────────────────────────────────────────────────────────────────┐
│ COST PER REQUEST (GPT-4o Pricing) │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Pattern Tokens/Req Cost/Req 1M Requests/Month │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Single LLM Call 1,500 $0.008 $8,000 │
│ (classification) │
│ │
│ Single LLM Call 3,000 $0.015 $15,000 │
│ (generation) │
│ │
│ 3-Step Chain 8,000 $0.040 $40,000 │
│ (sequential) │
│ │
│ Router + Handler 5,000 $0.025 $25,000 │
│ (conditional) │
│ │
│ State Machine 12,000 $0.060 $60,000 │
│ (5 transitions avg) │
│ │
│ Agent (simple) 25,000 $0.125 $125,000 │
│ (5 iterations avg) │
│ │
│ Agent (complex) 60,000 $0.300 $300,000 │
│ (12 iterations avg) │
│ │
│ Multi-Agent 120,000 $0.600 $600,000 │
│ (3 agents coordinating) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Note: Using GPT-4o at $2.50/1M input, $10/1M output (Dec 2024) │
│ Claude Sonnet is similar. GPT-4o-mini is ~20x cheaper. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The Hidden Costs
┌─────────────────────────────────────────────────────────────────────────┐
│ HIDDEN COST FACTORS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ CONTEXT ACCUMULATION (Agents) │
│ ───────────────────────────── │
│ Each iteration includes ALL previous context: │
│ │
│ Iteration 1: System + Task = 2,000 tokens │
│ Iteration 2: + Response + Tool Result = 4,500 tokens │
│ Iteration 3: + Response + Tool Result = 7,000 tokens │
│ Iteration 4: + Response + Tool Result = 9,500 tokens │
│ ... │
│ Iteration 10: + All accumulated = 25,000 tokens │
│ │
│ Total tokens for 10 iterations: ~95,000 (not 25,000!) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ RETRY AND ERROR COSTS │
│ ───────────────────── │
│ │
│ Workflows: Failed step retries only that step │
│ Agents: Failed iteration still consumed full context tokens │
│ │
│ If 10% of agent runs need 2 extra iterations due to errors: │
│ Additional cost = 10% × 2 iterations × accumulated context │
│ = ~15-20% cost increase │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ TOOL EXECUTION COSTS │
│ ──────────────────── │
│ │
│ • External API calls (search, databases) add per-call costs │
│ • Embedding calls for RAG add ~$0.0001 per query │
│ • Compute for code execution, image processing │
│ • Agents typically make 3-5x more tool calls than workflows │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Cost Optimization Strategies
┌─────────────────────────────────────────────────────────────────────────┐
│ COST OPTIMIZATION TACTICS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. MODEL TIERING │
│ Use cheap models for simple steps, expensive for complex │
│ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Classify │ ──→ │ Process │ ──→ │ Generate │ │
│ │ GPT-4o-mini│ │ GPT-4o-mini│ │ GPT-4o │ │
│ │ $0.001 │ │ $0.002 │ │ $0.010 │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ vs. GPT-4o for all steps: $0.013 vs $0.040 (3x cheaper) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 2. CACHING │
│ Cache identical/similar requests │
│ │
│ • Exact match cache: 100% token savings on hits │
│ • Semantic cache: Reuse similar query results │
│ • Tool result cache: Don't re-search same queries │
│ │
│ Typical cache hit rates: │
│ • Customer support: 30-50% │
│ • Search queries: 20-40% │
│ • Code assistance: 10-20% │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 3. CONTEXT PRUNING │
│ Summarize or drop old context in agents │
│ │
│ Instead of keeping all tool results: │
│ • Summarize after every 3 iterations │
│ • Keep only last N tool results in full │
│ • Extract key facts, discard raw output │
│ │
│ Savings: 40-60% on long agent runs │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 4. EARLY TERMINATION │
│ Detect when agent is done or stuck │
│ │
│ • Confidence scoring after each step │
│ • Loop detection (same action twice = stuck) │
│ • "Good enough" thresholds │
│ │
│ Average iterations: 12 → 7 with early termination │
│ Savings: ~40% on agent costs │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Latency Analysis: What Users Actually Experience
Cost matters for your budget. Latency matters for your users.
Latency Breakdown by Pattern
┌─────────────────────────────────────────────────────────────────────────┐
│ LATENCY BY PATTERN │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ Pattern p50 p95 p99 Max │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Single LLM Call 800ms 1.5s 3s 10s │
│ │
│ 3-Step Chain 2.4s 4.5s 8s 20s │
│ (sequential) │
│ │
│ 3-Step Chain 1.2s 2.5s 5s 12s │
│ (parallel where possible) │
│ │
│ Router + Handler 1.5s 3s 6s 15s │
│ │
│ Agent (5 iter) 8s 15s 30s 60s │
│ │
│ Agent (12 iter) 20s 40s 90s 180s │
│ │
│ Multi-Agent 30s 60s 120s 300s │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ Note: Assumes GPT-4o. Claude similar. Local models 2-5x faster. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Where Time Goes
┌─────────────────────────────────────────────────────────────────────────┐
│ LATENCY BREAKDOWN │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SINGLE LLM CALL (800ms typical) │
│ ─────────────────────────────── │
│ │
│ ████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░ │
│ │ LLM Generation (650ms) │ Network (150ms) │ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 3-STEP WORKFLOW (2.4s typical) │
│ ────────────────────────────── │
│ │
│ ████████░░██████████░░████████████████░░░░ │
│ │ Step 1 ││ Step 2 ││ Step 3 │Network│ │
│ │ 600ms ││ 700ms ││ 900ms │ 200ms │ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ AGENT - 5 ITERATIONS (8s typical) │
│ ───────────────────────────────── │
│ │
│ ████░████░████░████░████████░░░░░░░░░░░░░░░░░░░░░░░░ │
│ │Iter│Iter│Iter│Iter│ Iter │ Tool Execution │ │
│ │ 1 │ 2 │ 3 │ 4 │ 5 │ (2.5s) │ │
│ │0.8s│1.0s│1.2s│1.3s│ 1.5s │ │ │
│ │
│ Note: Each iteration is slower because context grows │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ BREAKDOWN OF AGENT ITERATION TIME: │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Component │ Time │ % of Total │ │
│ ├─────────────────────────┼────────────┼──────────────────────────│ │
│ │ LLM thinking │ 800ms │ 50% │ │
│ │ Tool execution │ 500ms │ 31% │ │
│ │ Network overhead │ 200ms │ 13% │ │
│ │ Parsing/processing │ 100ms │ 6% │ │
│ └─────────────────────────┴────────────┴──────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Latency Optimization Strategies
┌─────────────────────────────────────────────────────────────────────────┐
│ LATENCY OPTIMIZATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. STREAMING │
│ Show progress as it happens │
│ │
│ Without streaming: User waits 8s, sees complete response │
│ With streaming: User sees first token at 200ms, │
│ content streams over 8s │
│ │
│ Perceived latency: 8000ms → 200ms (40x improvement) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 2. PARALLEL TOOL CALLS │
│ │
│ Sequential: Tool A (500ms) → Tool B (500ms) → Tool C (500ms) │
│ Total: 1500ms │
│ │
│ Parallel: Tool A ─┐ │
│ Tool B ─┼── All complete in 500ms │
│ Tool C ─┘ │
│ Total: 500ms (3x faster) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 3. SPECULATIVE EXECUTION │
│ │
│ While waiting for user confirmation: │
│ • Pre-compute likely next steps │
│ • Pre-fetch probable tool results │
│ • Warm up embeddings/retrievals │
│ │
│ If prediction correct: Near-instant response │
│ If prediction wrong: No worse than baseline │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 4. MODEL SELECTION FOR SPEED │
│ │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Model │ Latency (p50) │ Quality Trade-off │ │
│ ├─────────────────────┼─────────────────┼─────────────────────│ │
│ │ GPT-4o │ 800ms │ Best quality │ │
│ │ GPT-4o-mini │ 400ms │ Good for routing │ │
│ │ Claude 3.5 Haiku │ 300ms │ Fast + capable │ │
│ │ Groq (Llama 70B) │ 150ms │ Very fast │ │
│ │ Local (Llama 8B) │ 100ms │ Simple tasks only │ │
│ └─────────────────────┴─────────────────┴─────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ 5. GRACEFUL DEGRADATION │
│ │
│ If latency exceeds threshold: │
│ • Return partial results with "still working..." │
│ • Simplify the approach (fewer iterations) │
│ • Fall back to cached similar response │
│ • Offer to notify when complete │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Monitoring: How to Know If You Chose Wrong
You've deployed your system. How do you know if you made the right architecture choice?
Key Metrics to Track
┌─────────────────────────────────────────────────────────────────────────┐
│ MONITORING DASHBOARD │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ EFFICIENCY METRICS │
│ ────────────────── │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Metric │ Healthy │ Investigate │ │
│ ├────────────────────────────┼─────────────┼──────────────────────│ │
│ │ Avg iterations (agent) │ 3-7 │ >10 consistently │ │
│ │ Max iterations hit rate │ <5% │ >15% │ │
│ │ Empty tool calls │ <2% │ >10% │ │
│ │ Repeated actions │ <5% │ >15% │ │
│ │ Cost per success │ Stable │ Trending up 20%+ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ QUALITY METRICS │
│ ─────────────── │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Metric │ Healthy │ Investigate │ │
│ ├────────────────────────────┼─────────────┼──────────────────────│ │
│ │ Task success rate │ >85% │ <70% │ │
│ │ User satisfaction │ >4.0/5 │ <3.5/5 │ │
│ │ Escalation rate │ <10% │ >25% │ │
│ │ Retry rate │ <15% │ >30% │ │
│ │ Hallucination rate │ <5% │ >15% │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ OPERATIONAL METRICS │
│ ─────────────────── │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ Metric │ Healthy │ Investigate │ │
│ ├────────────────────────────┼─────────────┼──────────────────────│ │
│ │ p95 latency │ <SLA │ >SLA │ │
│ │ Error rate │ <2% │ >5% │ │
│ │ Timeout rate │ <1% │ >5% │ │
│ │ Cost variance │ ±20% │ >50% spikes │ │
│ │ Tool failure rate │ <5% │ >15% │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Signs You Chose Wrong
┌─────────────────────────────────────────────────────────────────────────┐
│ ARCHITECTURE MISMATCH SIGNALS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ YOU BUILT A WORKFLOW BUT SHOULD HAVE BUILT AN AGENT IF: │
│ ───────────────────────────────────────────────────────── │
│ │
│ • High rate of "I don't have enough information" responses │
│ • Users frequently need to rephrase/retry queries │
│ • Edge cases keep requiring new workflow branches │
│ • Success rate varies wildly by input type │
│ • You're adding if/else branches weekly │
│ │
│ Pattern in logs: │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Query: "Find pricing for competitor X and compare to us" │ │
│ │ Result: FAILED - No handler for comparative analysis │ │
│ │ │ │
│ │ Query: "Why did the deployment fail yesterday?" │ │
│ │ Result: FAILED - Requires multi-step investigation │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ → Consider: Adding agent escape hatch for complex queries │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ YOU BUILT AN AGENT BUT SHOULD HAVE BUILT A WORKFLOW IF: │
│ ───────────────────────────────────────────────────────── │
│ │
│ • 90%+ of runs follow the same tool sequence │
│ • Agent rarely uses more than 2-3 tools │
│ • Iteration count is consistently low (2-3) │
│ • Most "thinking" is repetitive/unnecessary │
│ • Cost per query is 10x higher than it needs to be │
│ │
│ Pattern in logs: │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Run 1: search → format → respond (3 iterations) │ │
│ │ Run 2: search → format → respond (3 iterations) │ │
│ │ Run 3: search → format → respond (3 iterations) │ │
│ │ Run 4: search → format → respond (3 iterations) │ │
│ │ ... │ │
│ │ (Same pattern in 94% of runs) │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ → Consider: Converting to 3-step workflow (3x cheaper, 2x faster) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ YOU'RE OVER-ENGINEERING IF: │
│ ────────────────────────── │
│ │
│ • Built multi-agent but tasks don't require handoffs │
│ • Most "routing" goes to one handler (>80%) │
│ • Complexity adds latency without improving success rate │
│ • Team spends more time debugging orchestration than improving core │
│ │
│ → Consider: Simplifying to single agent or workflow │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Continuous Optimization Loop
┌─────────────────────────────────────────────────────────────────────────┐
│ OPTIMIZATION FEEDBACK LOOP │
│ │
│ ┌─────────────┐ │
│ │ Deploy │ │
│ └──────┬──────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ ┌───────────────│ Monitor │───────────────┐ │
│ │ └──────┬──────┘ │ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │Cost spikes? │ │Quality drop?│ │Latency SLA? │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Analyze │ │ Analyze │ │ Analyze │ │
│ │ token usage │ │ failure │ │ bottlenecks │ │
│ │ patterns │ │ patterns │ │ │ │
│ └──────┬──────┘ └──────┬──────┘ └──────┬──────┘ │
│ │ │ │ │
│ └──────────────────────┼──────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Decision │ │
│ └──────┬──────┘ │
│ │ │
│ ┌──────────────────────┼──────────────────────┐ │
│ ▼ ▼ ▼ │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Simplify │ │ Enhance │ │ Optimize │ │
│ │ (workflow) │ │ (agent) │ │ (hybrid) │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │
│ Weekly review cadence recommended for active systems │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Migration Strategies
Sometimes you need to change approaches. Here's how to migrate safely.
Workflow → Agent Migration
When your workflow can't handle the complexity anymore:
┌─────────────────────────────────────────────────────────────────────────┐
│ WORKFLOW → AGENT MIGRATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ PHASE 1: IDENTIFY CANDIDATES │
│ ──────────────────────────── │
│ │
│ Look for: │
│ • Steps that frequently fail or need retries │
│ • Branches that keep multiplying │
│ • User complaints about "it doesn't understand me" │
│ • High variance in what the step needs to do │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PHASE 2: HYBRID FIRST │
│ ───────────────────── │
│ │
│ Don't replace the whole workflow. Add agent escape hatches: │
│ │
│ BEFORE: │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Step 1 │ ──→ │ Step 2 │ ──→ │ Step 3 │ │
│ └─────────┘ └─────────┘ └─────────┘ │
│ │ │
│ ▼ (fails 20% of time) │
│ ERROR │
│ │
│ AFTER: │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ Step 1 │ ──→ │ Step 2 │ ──→ │ Step 3 │ │
│ └─────────┘ └────┬────┘ └─────────┘ │
│ │ │
│ ▼ (if confidence < 0.8) │
│ ┌─────────┐ │
│ │ Agent │ (handles edge cases) │
│ │ Fallback│ │
│ └─────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PHASE 3: GRADUAL PROMOTION │
│ ────────────────────────── │
│ │
│ Week 1: 5% traffic to agent path (shadow mode) │
│ Week 2: Compare metrics, fix issues │
│ Week 3: 20% traffic to agent path │
│ Week 4: 50% traffic │
│ Week 5: 100% traffic to agent │
│ │
│ Rollback triggers: │
│ • Success rate drops >10% │
│ • p95 latency exceeds 2x baseline │
│ • Cost exceeds 3x baseline │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Agent → Workflow Migration
When you've over-engineered and need to simplify:
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT → WORKFLOW MIGRATION │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ PHASE 1: ANALYZE TRACES │
│ ─────────────────────── │
│ │
│ Collect 1000+ agent traces and analyze: │
│ │
│ • What tool sequences are most common? │
│ • What % of runs follow predictable patterns? │
│ • Where does the agent actually need to "think"? │
│ │
│ Example analysis: │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Pattern │ Frequency │ │
│ ├───────────────────────────────────────┼────────────────────────│ │
│ │ search → read → respond │ 45% │ │
│ │ search → search → read → respond │ 25% │ │
│ │ read → respond │ 15% │ │
│ │ (complex/varied) │ 15% │ │
│ └────────────────────────────────────────────────────────────────┘ │
│ │
│ 85% follows 3 patterns → Strong workflow candidate │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PHASE 2: BUILD PARALLEL WORKFLOW │
│ ──────────────────────────────── │
│ │
│ Create workflow that handles the common patterns: │
│ │
│ ┌─────────────────────────────────────────────────────────────────┐ │
│ │ │ │
│ │ ┌──────────────┐ │ │
│ │ │ Classify │ │ │
│ │ │ Query │ │ │
│ │ └───────┬──────┘ │ │
│ │ │ │ │
│ │ ┌─────────────────┼─────────────────┐ │ │
│ │ ▼ ▼ ▼ │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ Pattern │ │ Pattern │ │ Send to │ │ │
│ │ │ A │ │ B │ │ Agent │ │ │
│ │ │ (45%) │ │ (40%) │ │ (15%) │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ │ │ │
│ └─────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PHASE 3: SHADOW TESTING │
│ ─────────────────────── │
│ │
│ Run both in parallel, compare outputs: │
│ │
│ Request ──┬──→ Agent (current) ───────→ Return to user │
│ │ │
│ └──→ Workflow (shadow) ──→ Log & compare │
│ │
│ Measure: │
│ • Output equivalence rate │
│ • Cost difference │
│ • Latency difference │
│ • Cases where workflow fails but agent succeeds │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ PHASE 4: GRADUAL CUTOVER │
│ ──────────────────────── │
│ │
│ Same gradual rollout as before (5% → 20% → 50% → 100%) │
│ Keep agent as fallback for the 15% complex cases │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Framework Selection Guide
Different frameworks have different strengths for workflows vs agents.
┌─────────────────────────────────────────────────────────────────────────┐
│ FRAMEWORK COMPARISON │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────────────────────────────────────────────────────────────────┐ │
│ │ Framework │ Workflows │ Agents │ Best For │ │
│ ├─────────────────┼───────────┼────────┼────────────────────────────│ │
│ │ │ │ │ │ │
│ │ LangChain │ ★★★ │ ★★☆ │ Quick prototypes, many │ │
│ │ (LCEL) │ │ │ integrations │ │
│ │ │ │ │ │ │
│ │ LangGraph │ ★★★ │ ★★★ │ Complex state machines, │ │
│ │ │ │ │ cycles, human-in-loop │ │
│ │ │ │ │ │ │
│ │ LlamaIndex │ ★★☆ │ ★★☆ │ RAG-heavy applications, │ │
│ │ │ │ │ document processing │ │
│ │ │ │ │ │ │
│ │ CrewAI │ ★☆☆ │ ★★★ │ Multi-agent systems, │ │
│ │ │ │ │ role-based agents │ │
│ │ │ │ │ │ │
│ │ AutoGen │ ★☆☆ │ ★★★ │ Conversational agents, │ │
│ │ │ │ │ agent collaboration │ │
│ │ │ │ │ │ │
│ │ Semantic │ ★★★ │ ★★☆ │ Enterprise, .NET/C#, │ │
│ │ Kernel │ │ │ Microsoft ecosystem │ │
│ │ │ │ │ │ │
│ │ Haystack │ ★★★ │ ★☆☆ │ Production RAG, │ │
│ │ │ │ │ pipeline-first │ │
│ │ │ │ │ │ │
│ │ Custom │ ★★★ │ ★★★ │ Full control, minimal │ │
│ │ (no framework) │ │ │ dependencies │ │
│ │ │ │ │ │ │
│ └───────────────────────────────────────────────────────────────────┘ │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ DECISION GUIDE: │
│ │
│ "I need a simple 3-step workflow" │
│ → LangChain LCEL or custom code │
│ │
│ "I need workflows with complex branching and state" │
│ → LangGraph │
│ │
│ "I need a ReAct agent with tools" │
│ → LangGraph, CrewAI, or custom │
│ │
│ "I need multiple agents working together" │
│ → CrewAI, AutoGen, or LangGraph │
│ │
│ "I need production RAG pipelines" │
│ → LlamaIndex, Haystack │
│ │
│ "I need full control and minimal magic" │
│ → Custom implementation │
│ │
│ "I'm in a Microsoft/enterprise environment" │
│ → Semantic Kernel │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Framework-Specific Patterns
┌─────────────────────────────────────────────────────────────────────────┐
│ PATTERN IMPLEMENTATION BY FRAMEWORK │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ SIMPLE CHAIN │
│ ──────────── │
│ │
│ LangChain: prompt | llm | parser │
│ LangGraph: StateGraph with linear edges │
│ Custom: for step in steps: result = step(result) │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ ROUTER │
│ ────── │
│ │
│ LangChain: RunnableBranch with conditions │
│ LangGraph: Conditional edges based on state │
│ Custom: if/elif with classification step │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ REACT AGENT │
│ ─────────── │
│ │
│ LangChain: create_react_agent() │
│ LangGraph: prebuilt.create_react_agent() or custom graph │
│ CrewAI: Agent with tools │
│ Custom: While loop with tool execution │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ HUMAN-IN-THE-LOOP │
│ ───────────────── │
│ │
│ LangChain: Callbacks (limited) │
│ LangGraph: interrupt() + Command pattern (native support) │
│ Custom: State persistence + resume logic │
│ │
│ ───────────────────────────────────────────────────────────────────── │
│ │
│ MULTI-AGENT │
│ ─────────── │
│ │
│ LangChain: Manual orchestration │
│ LangGraph: Subgraphs + supervisor pattern │
│ CrewAI: Crew with multiple Agents (native) │
│ AutoGen: GroupChat, native multi-agent │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Implementation Checklist
When designing your system, walk through this checklist:
┌─────────────────────────────────────────────────────────────────────────┐
│ IMPLEMENTATION CHECKLIST │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ STEP 1: Understand Your Task │
│ ───────────────────────────── │
│ □ Can I enumerate the steps needed? │
│ □ Do the steps change based on intermediate results? │
│ □ Is this task exploratory or procedural? │
│ □ What's the worst-case number of steps? │
│ │
│ STEP 2: Assess Your Constraints │
│ ────────────────────────────── │
│ □ What's my latency budget? │
│ □ What's my cost budget per request? │
│ □ Do I need deterministic behavior? │
│ □ What are my compliance/audit requirements? │
│ │
│ STEP 3: Start Simple │
│ ─────────────────── │
│ □ Try single LLM call first │
│ □ Identify where it fails │
│ □ Add complexity only for failure cases │
│ □ Document why each step exists │
│ │
│ STEP 4: Add Guardrails │
│ ───────────────────── │
│ □ Set max iterations for any loops │
│ □ Set token/cost budgets │
│ □ Add timeouts │
│ □ Plan graceful degradation │
│ │
│ STEP 5: Design for Observability │
│ ──────────────────────────────── │
│ □ Log all LLM calls with inputs/outputs │
│ □ Track tokens, latency, costs │
│ □ Trace execution path │
│ □ Alert on anomalies (high iteration counts, etc.) │
│ │
│ STEP 6: Plan for Evolution │
│ ────────────────────────── │
│ □ Make it easy to promote workflow steps to agents │
│ □ Make it easy to demote agent tasks to workflows │
│ □ Track which requests actually need agent flexibility │
│ □ Regularly review and simplify │
│ │
└─────────────────────────────────────────────────────────────────────────┘
Summary: The Right Tool for the Job
┌─────────────────────────────────────────────────────────────────────────┐
│ KEY TAKEAWAYS │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. DEFAULT TO WORKFLOWS │
│ Most production AI systems should be workflows. │
│ They're cheaper, faster, and more predictable. │
│ │
│ 2. USE AGENTS FOR THE UNKNOWN │
│ Agents shine when you genuinely don't know the │
│ steps in advance. Research, debugging, exploration. │
│ │
│ 3. HYBRID IS USUALLY BEST │
│ Workflow with agent escape hatches, or agent │
│ orchestrating workflows. Get the best of both. │
│ │
│ 4. EARN COMPLEXITY │
│ Start simple. Add complexity only when you have │
│ evidence that simpler approaches fail. │
│ │
│ 5. ALWAYS BOUND AGENTS │
│ Max iterations, token budgets, timeouts, cost limits. │
│ Unbounded agents will eventually surprise you. │
│ │
│ 6. MEASURE AND ITERATE │
│ Track what percentage of requests actually need │
│ agent flexibility. Optimize the common path. │
│ │
└─────────────────────────────────────────────────────────────────────────┘
The goal isn't to use the most sophisticated architecture—it's to use the simplest architecture that solves your problem reliably.
References & Further Reading
Foundational Resources
- Building Effective Agents - Anthropic's guide that popularized the workflow vs agent distinction. Essential reading.
- What is Agentic AI? - LangChain's overview of agentic patterns and when to use them.
- ReAct: Synergizing Reasoning and Acting in Language Models - The original paper introducing the ReAct pattern.
Framework Documentation
- LangGraph Conceptual Guide - Understanding state machines and agent loops in LangGraph.
- LlamaIndex Workflows - LlamaIndex's approach to DAG-based workflows.
- CrewAI Documentation - Multi-agent orchestration patterns.
- AutoGen - Microsoft's multi-agent conversation framework.
Research & Deep Dives
- The Landscape of Emerging AI Agent Architectures - Academic survey of agent design patterns.
- Cognitive Architectures for Language Agents - CoALA framework for understanding agent designs.
- Reflexion: Language Agents with Verbal Reinforcement Learning - Self-correction patterns for agents.
- Tree of Thoughts - Deliberate problem-solving with LLMs.
Practical Guides
- OpenAI Function Calling Guide - Tool use fundamentals.
- Anthropic Tool Use Guide - Claude's approach to tool calling.
- Prompt Caching - Reducing costs in multi-turn agent interactions.
Frequently Asked Questions
Related Articles
Building Agentic AI Systems: A Complete Implementation Guide
A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.
LLM Frameworks: LangChain, LlamaIndex, LangGraph, and Beyond
A comprehensive comparison of LLM application frameworks—LangChain, LlamaIndex, LangGraph, Haystack, and alternatives. When to use each, how to combine them, and practical implementation patterns.
Structured Outputs and Tool Use: Patterns for Reliable AI Applications
Master structured output generation and tool use patterns—JSON mode, schema enforcement, Instructor library, function calling best practices, error handling, and production patterns for reliable AI applications.
Building Customer Support Agents: A Production Architecture Guide
A comprehensive guide to building multi-agent customer support systems—triage routing, specialized agents, context handoffs, guardrails, and production patterns with full implementation examples.
Agent Evaluation and Testing: From Development to Production
A comprehensive guide to evaluating AI agents—task success metrics, trajectory analysis, tool use correctness, sandboxing, and building robust testing pipelines for production agent systems.
AI Agent Economics: Unit Costs, ROI Frameworks, and Cost Optimization
A comprehensive framework for calculating AI agent costs, understanding reasoning token economics, optimizing spend with model cascading, and building ROI models for agentic systems.
Human-in-the-Loop UX: Designing Control Surfaces for AI Agents
Design patterns for human oversight of AI agents—pause mechanisms, approval workflows, progressive autonomy, and the UX of agency. How to build systems where humans stay in control.
Agentic RAG: When Retrieval Meets Autonomous Reasoning
How to build RAG systems that don't just retrieve—they reason, plan, and iteratively refine their searches to solve complex information needs.
LLM Memory Systems: From MemGPT to Long-Term Agent Memory
Understanding memory architectures for LLM agents—MemGPT's hierarchical memory, Letta's agent framework, and patterns for building agents that learn and remember across conversations.
The Rise of Agentic AI: Understanding MCP and A2A Protocols
An exploration of the emerging protocols enabling AI agents to communicate and collaborate, including Model Context Protocol (MCP) and Agent-to-Agent (A2A) communication.