Skip to main content
Back to Blog

LLM Frameworks: LangChain, LlamaIndex, LangGraph, and Beyond

A comprehensive comparison of LLM application frameworks—LangChain, LlamaIndex, LangGraph, Haystack, and alternatives. When to use each, how to combine them, and practical implementation patterns.

13 min read
Share:

The Framework Landscape in 2025

Building LLM applications from scratch is complex. Frameworks abstract common patterns—RAG, agents, chains—letting you focus on your application logic rather than reinventing infrastructure.

This guide provides a comprehensive comparison of the major frameworks, when to use each, and how to combine them effectively for production systems.

Framework Overview

FrameworkPrimary FocusBest ForCompany
LangChainGeneral LLM applicationsComplex agents, multi-step workflowsLangChain, Inc
LlamaIndexData & RAGDocument Q&A, knowledge basesLlamaIndex, Inc
LangGraphMulti-agent systemsStateful workflows, agent orchestrationLangChain, Inc
HaystackSearch & NLPProduction search, NLP pipelinesdeepset
Semantic KernelEnterprise AIAzure integration, .NET supportMicrosoft
DSPyPrompt optimizationResearch, automated tuningStanford NLP
CrewAIMulti-agent teamsRole-based agentsCrewAI
AutoGenAgent conversationsMulti-agent dialogueMicrosoft

LangChain

Overview

LangChain is the most comprehensive LLM application framework, providing building blocks for chains, agents, RAG, and complex workflows.

From research: "LangChain is the most flexible foundation for complex LLM apps and agentic workflows, with excellent production tooling."

Core Architecture

Code
LangChain Ecosystem
├── langchain-core     # Base abstractions
├── langchain          # Main framework
├── langchain-community # Third-party integrations
├── langchain-openai   # OpenAI-specific
├── langchain-anthropic # Anthropic-specific
└── langgraph          # Multi-agent workflows

Core Concepts

Chains (LCEL - LangChain Expression Language)

LCEL (LangChain Expression Language) is LangChain's declarative way to compose operations. The key innovation is the pipe operator (|) which chains components together: data flows from left to right through each component. Think of it like Unix pipes—prompt | model | parser means "take input, format it with the prompt, send to the model, parse the output."

LCEL provides three major benefits: (1) automatic batching and parallel execution, (2) built-in streaming support, and (3) consistent interface across all components. Every LCEL chain supports .invoke() (single input), .batch() (multiple inputs), and .stream() (streaming output) without any extra code.

Python
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnablePassthrough

# Simple chain
prompt = ChatPromptTemplate.from_template(
    "Summarize the following text in {style} style:\n\n{text}"
)
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()

chain = prompt | model | parser

result = chain.invoke({
    "style": "concise bullet points",
    "text": long_document
})

# Chain with parallel operations
from langchain_core.runnables import RunnableParallel

analysis_chain = RunnableParallel({
    "summary": prompt | model | StrOutputParser(),
    "key_points": key_points_prompt | model | JsonOutputParser(),
    "sentiment": sentiment_prompt | model | StrOutputParser(),
})

results = analysis_chain.invoke({"text": document})
# Returns: {"summary": "...", "key_points": [...], "sentiment": "..."}

Understanding this code:

  • ChatPromptTemplate (line 6-7): Creates a reusable template with variables ({style}, {text}). LangChain handles the formatting—you pass a dict, it fills in the placeholders.

  • Pipe composition (line 12): prompt | model | parser creates a chain. When you call .invoke(), data flows through: dict → formatted prompt → LLM → parsed string. Each component transforms the output of the previous one.

  • RunnableParallel (lines 20-25): Runs multiple chains simultaneously with the same input. This is efficient—one network call per model invocation, but all three analyses happen concurrently. Results come back as a dict with keys matching your chain names.

  • Different parsers: StrOutputParser() extracts raw text, JsonOutputParser() parses JSON into a dict. Match parser to your expected output format.

Structured Output

LLMs naturally output free-form text, but applications need structured data. LangChain's .with_structured_output() forces the model to return data matching a Pydantic schema. Under the hood, it uses the model's function calling or JSON mode to guarantee valid output—no parsing errors, no missing fields.

This is one of LangChain's most valuable features. Instead of hoping the model outputs valid JSON and writing error handling for when it doesn't, you define a schema and let LangChain ensure compliance.

Python
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

class ExtractedInfo(BaseModel):
    """Information extracted from document."""
    title: str = Field(description="Document title")
    entities: List[str] = Field(description="Named entities mentioned")
    summary: str = Field(description="Brief summary")
    confidence: float = Field(description="Confidence score 0-1")

model = ChatOpenAI(model="gpt-4o")
structured_model = model.with_structured_output(ExtractedInfo)

result = structured_model.invoke("Extract info from: " + document)
# Returns: ExtractedInfo(title="...", entities=[...], ...)

Key points:

  • Pydantic schema (lines 3-8): The Field(description=...) is crucial—the model reads these descriptions to understand what each field should contain. Good descriptions improve extraction accuracy.

  • with_structured_output() (line 11): Wraps the model with structured output handling. The returned structured_model can be used anywhere a regular model would be, but always returns your schema type.

  • Type safety: The result is a proper ExtractedInfo instance, not a dict. You get IDE autocomplete, type checking, and validation automatically.

Agents with Tools

Agents are LLMs that can take actions. Instead of just generating text, an agent can search the web, query databases, or call APIs. The key pattern is ReAct (Reasoning + Acting): the model thinks about what to do, takes an action, observes the result, and repeats until it has an answer.

LangChain agents require three components: (1) tools the agent can use, (2) a prompt that teaches the model how to use tools, and (3) an executor that manages the think-act-observe loop.

Python
from langchain.agents import create_react_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools import Tool
from langchain_core.prompts import PromptTemplate

# Define tools
search = DuckDuckGoSearchRun()

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

tools = [
    Tool(
        name="web_search",
        func=search.run,
        description="Search the web for current information"
    ),
    Tool(
        name="calculator",
        func=calculate,
        description="Evaluate mathematical expressions"
    ),
]

# Create agent
prompt = PromptTemplate.from_template("""
You are a helpful assistant with access to tools.

Tools available:
{tools}

Tool names: {tool_names}

Question: {input}
{agent_scratchpad}
""")

agent = create_react_agent(
    llm=ChatOpenAI(model="gpt-4o"),
    tools=tools,
    prompt=prompt
)

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=5,
    handle_parsing_errors=True
)

result = executor.invoke({"input": "What is 15% of the current Bitcoin price?"})

Understanding the agent architecture:

  • Tool definition (lines 6-23): Each Tool needs a name, function, and description. The description is what the model reads to decide when to use the tool—be specific about what it does and when it's appropriate. Vague descriptions lead to wrong tool choices.

  • Prompt template (lines 26-36): The prompt teaches the model the ReAct format. {tools} lists tool descriptions, {tool_names} provides valid names, and {agent_scratchpad} is where the model's thinking history goes. This format enables multi-step reasoning.

  • create_react_agent (lines 38-42): Creates an agent that follows ReAct. The agent parses model output to extract tool calls, but doesn't execute them—that's the executor's job.

  • AgentExecutor (lines 44-50): The runtime loop that:

    1. Passes the question to the agent
    2. Parses the response to find tool calls
    3. Executes the tool and gets results
    4. Feeds results back to the agent
    5. Repeats until the agent says "Final Answer"
  • max_iterations=5: Prevents infinite loops. If the agent can't solve the problem in 5 steps, it gives up. Tune based on task complexity.

  • handle_parsing_errors=True: If the model outputs malformed tool calls, the executor recovers gracefully instead of crashing.

RAG with LangChain

RAG (Retrieval-Augmented Generation) grounds LLM responses in your data. Instead of relying on the model's training knowledge, you retrieve relevant documents and include them in the prompt. This reduces hallucinations and lets you work with private or recent information.

LangChain's RAG pipeline has four stages: (1) load documents, (2) split into chunks, (3) embed and store in a vector database, (4) retrieve and generate answers. Each stage is customizable—you can swap vector stores, change chunking strategies, or modify retrieval logic.

Python
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate

# Load and process documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# RAG chain with custom prompt
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following context. If the context doesn't
contain enough information, say so.

Context: {context}

Question: {question}

Answer:""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = rag_chain.invoke("What are the key findings?")

RAG pipeline explained:

  • Document loading (lines 8-9): PyPDFLoader extracts text from PDFs. LangChain has loaders for 100+ formats—use the one that matches your data. Loaders return Document objects with text content and metadata.

  • Text splitting (lines 11-15): Documents are too long to embed whole. RecursiveCharacterTextSplitter breaks them into chunks, trying to preserve semantic boundaries. chunk_overlap=200 means adjacent chunks share 200 characters—this helps with questions that span chunk boundaries.

  • Embedding (lines 18-23): Each chunk is converted to a vector using the embedding model. text-embedding-3-small is OpenAI's efficient option. Vectors are stored in Chroma (a local vector database). persist_directory saves to disk so you don't re-embed on restart.

  • Retriever (lines 26-29): Wraps the vector store with a search interface. similarity_top_k=5 returns the 5 most relevant chunks for each query. The retriever can be swapped for hybrid search, re-ranking, or other strategies.

  • LCEL RAG chain (lines 42-47): The modern approach. retriever | format_docs fetches and formats context, RunnablePassthrough() passes the question unchanged. Both feed into the prompt, then the model generates an answer.

The LangChain Ecosystem

LangSmith (Observability)

Python
# Set environment variables
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-project"

# All chain/agent executions are now traced
# View at smith.langchain.com

LangSmith features:

  • Trace visualization
  • Latency analysis
  • Token usage tracking
  • Dataset management
  • Evaluation runs
  • Prompt versioning

LangServe (Deployment)

Python
from fastapi import FastAPI
from langserve import add_routes

app = FastAPI()

# Add chain as REST endpoint
add_routes(
    app,
    rag_chain,
    path="/rag",
)

# Run with: uvicorn main:app --reload
# API available at: http://localhost:8000/rag/invoke

LangChain Strengths

From IBM: "LangChain shines in projects that demand complex reasoning chains and multi-step AI workflows. If your goal is to create autonomous agents capable of making decisions, interacting with multiple APIs, and managing dynamic conversational flows, LangChain provides the flexibility needed."

Key strengths:

  • Flexibility: Swap components easily
  • Integrations: 700+ integrations
  • Observability: LangSmith for production
  • Community: Large ecosystem, many examples
  • Agent support: Best-in-class agent tooling

LangChain Weaknesses

  • Complexity: Learning curve can be steep
  • Abstraction overhead: Sometimes too many layers
  • Breaking changes: Rapid iteration means API changes
  • Memory usage: Can be heavy for simple use cases

Best For

  • Complex agent workflows
  • Multi-step reasoning chains
  • Applications needing many integrations
  • Teams wanting comprehensive tooling
  • Production systems with observability needs

LlamaIndex

Overview

LlamaIndex specializes in connecting LLMs to your data with optimized RAG pipelines and data ingestion.

From research: "LlamaIndex is the fastest route to high-quality, production-grade RAG on your data."

Core Architecture

Code
LlamaIndex Ecosystem
├── llama-index-core        # Core abstractions
├── llama-index-llms-*      # LLM integrations
├── llama-index-embeddings-* # Embedding models
├── llama-index-readers-*   # Data connectors
├── llama-index-vector-stores-* # Vector DBs
└── llama-cloud             # Managed services

Core Concepts

Data Connectors (Readers)

LlamaIndex excels at data ingestion. Readers connect to data sources and produce Document objects that can be indexed. The library includes readers for files, databases, APIs, cloud storage, and more. Each reader handles the complexity of its data source—parsing PDFs, handling authentication, pagination—so you focus on what to do with the data.

The pattern is consistent across all readers: create a reader with configuration, call .load_data(), get documents. This uniformity makes it easy to support multiple data sources in a single application.

Python
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.database import DatabaseReader
from llama_index.readers.notion import NotionPageReader

# Local files (PDF, DOCX, TXT, etc.)
documents = SimpleDirectoryReader(
    input_dir="./data",
    recursive=True,
    required_exts=[".pdf", ".docx", ".txt"]
).load_data()

# Web pages
web_docs = SimpleWebPageReader(
    html_to_text=True
).load_data([
    "https://docs.example.com/guide",
    "https://docs.example.com/api"
])

# Database
db_reader = DatabaseReader(
    uri="postgresql://user:pass@localhost/db"
)
db_docs = db_reader.load_data(
    query="SELECT title, content FROM articles WHERE published = true"
)

# Notion
notion_docs = NotionPageReader(
    integration_token="secret_xxx"
).load_data(page_ids=["page-id-1", "page-id-2"])

Reader options explained:

  • SimpleDirectoryReader: The workhorse for local files. recursive=True processes subdirectories, required_exts filters by file type. It auto-detects parsers for PDF, DOCX, TXT, and more.

  • SimpleWebPageReader: Fetches and parses web pages. html_to_text=True strips HTML tags for cleaner text. For dynamic sites, consider using a headless browser reader instead.

  • DatabaseReader: Executes SQL and converts rows to documents. Each row becomes a document—customize the query to get the right granularity.

  • NotionPageReader: Connects to Notion's API. Requires an integration token from your Notion workspace. Can fetch entire databases, not just individual pages.

Indexes and Query Engines

An index is how LlamaIndex organizes your documents for efficient retrieval. The most common is VectorStoreIndex, which embeds documents and enables semantic search. Once you have an index, you create a query engine to search it and generate answers.

LlamaIndex separates retrieval (finding relevant documents) from synthesis (generating answers). This separation lets you customize each stage independently—use one retrieval strategy for speed, another for accuracy.

Python
from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure global settings
Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Create index
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

# Simple query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response.response)
print(response.source_nodes)  # Retrieved chunks

# Advanced query engine
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.postprocessor import SimilarityPostprocessor

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

Index and query engine concepts:

  • Settings (lines 6-7): Global configuration for LLM and embedding model. All indexes and query engines use these defaults unless overridden. Set temperature=0.1 for factual queries where consistency matters.

  • VectorStoreIndex (lines 10-13): Creates an index from documents. show_progress=True displays a progress bar during embedding—useful for large document sets.

  • Simple query engine (lines 16-19): .as_query_engine() creates a default query engine. It retrieves relevant chunks and synthesizes an answer. response.source_nodes shows which chunks were used—essential for debugging and citation.

  • Advanced query engine (lines 22-35): For more control, build components separately. VectorIndexRetriever handles search, SimilarityPostprocessor filters low-quality matches (below 0.7 similarity). Stack multiple postprocessors for re-ranking, deduplication, or metadata filtering.

Advanced RAG Patterns

Basic RAG retrieves chunks based on semantic similarity, but this misses cases where exact keyword matches matter or where the relevant information spans multiple chunks. LlamaIndex provides advanced patterns that significantly improve retrieval quality:

  • Hybrid search combines semantic and keyword matching—catch both conceptually similar and exact term matches
  • Sentence window retrieval returns the sentences around matches for better context
  • Auto-merging retrieval uses hierarchical chunks, merging small chunks into larger ones when multiple small chunks from the same section are retrieved
Python
# Hybrid search (semantic + keyword)
from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [
        index.as_retriever(similarity_top_k=5),
        # BM25 retriever for keyword matching
        bm25_retriever,
    ],
    similarity_top_k=10,
    num_queries=4,  # Generate query variations
    mode="reciprocal_rerank",
)

# Sentence window retrieval
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Auto-merging retrieval (hierarchical chunks)
from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core.postprocessor import AutoMergingRetriever

node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)

Agents in LlamaIndex

Python
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool

# Create tools from query engines
query_tool = QueryEngineTool.from_defaults(
    query_engine=index.as_query_engine(),
    name="knowledge_base",
    description="Search the company knowledge base"
)

# Custom function tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a recipient."""
    # Implementation
    return f"Email sent to {to}"

email_tool = FunctionTool.from_defaults(fn=send_email)

# Create agent
agent = ReActAgent.from_tools(
    tools=[query_tool, email_tool],
    llm=OpenAI(model="gpt-4o"),
    verbose=True,
    max_iterations=10
)

response = agent.chat("Find info about our refund policy and email it to customer@example.com")

The LlamaIndex Ecosystem

LlamaCloud (Managed Services)

Python
from llama_cloud import LlamaCloudIndex

# Use managed parsing and indexing
index = LlamaCloudIndex.from_documents(
    documents,
    name="my-index",
    project_name="my-project"
)

# Query through cloud
response = index.as_query_engine().query("What are the key points?")

LlamaCloud features:

  • LlamaParse: Advanced document parsing
  • Managed vector storage
  • Automatic chunking optimization
  • API-based access

LlamaParse (Document Parsing)

Python
from llama_parse import LlamaParse

parser = LlamaParse(
    api_key="llx-xxx",
    result_type="markdown",
    num_workers=4,
    verbose=True,
    language="en"
)

# Parse complex documents (tables, images, etc.)
documents = parser.load_data("complex_report.pdf")

LlamaIndex Strengths

From research: "Since LlamaIndex was designed with RAG-heavy workflows in mind, it has a best-in-class data ingestion toolset. The framework helps engineering teams clean and structure messy data before it hits the retriever."

Key strengths:

  • RAG quality: Best-in-class retrieval patterns
  • Data connectors: 160+ data sources
  • Document parsing: LlamaParse handles complex docs
  • Learning curve: Gentler than LangChain for RAG
  • Optimized patterns: Sentence window, auto-merging built-in

LlamaIndex Weaknesses

  • Agent support: Less mature than LangChain
  • General workflows: Less flexible for non-RAG use cases
  • Ecosystem: Smaller than LangChain
  • Observability: LlamaTrace less feature-rich than LangSmith

Best For

  • RAG-first applications
  • Document Q&A systems
  • Knowledge bases
  • Complex document types (tables, images)
  • Quick prototyping to production RAG

LangGraph

Overview

LangGraph enables complex multi-agent systems with explicit state management and cyclic workflows.

From research: "LangGraph is a stateful framework for building multi-agent systems as graphs, created by the LangChain team. Engineers model workflows using nodes (tools, functions, LLMs, subgraphs) and edges (loops, conditional routes)."

Core Concepts

LangGraph models workflows as directed graphs where nodes are functions (often calling LLMs) and edges define the flow between them. The key innovation is state: a typed dictionary that flows through the graph, accumulating results from each node. Unlike simple chains, LangGraph supports cycles—a reviewer can send work back to a writer, creating iterative refinement loops.

Understanding state is crucial: every node receives the current state and returns updates. The graph merges these updates (using Annotated operators) and passes the new state to the next node.

State and Graph Definition

Let's build a research-write-review workflow that iterates until the reviewer approves or max iterations are reached:

Python
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator

# Define state schema
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    current_step: str
    research_data: str
    draft: str
    feedback: str
    iteration: int

# Define node functions
def researcher(state: AgentState) -> AgentState:
    """Research node - gathers information."""
    # Call LLM to research
    research = llm.invoke(f"Research: {state['messages'][-1]}")
    return {
        "messages": [f"Research complete: {research}"],
        "research_data": research,
        "current_step": "writer"
    }

def writer(state: AgentState) -> AgentState:
    """Writer node - creates draft."""
    draft = llm.invoke(
        f"Write based on: {state['research_data']}\n"
        f"Previous feedback: {state.get('feedback', 'None')}"
    )
    return {
        "messages": [f"Draft created"],
        "draft": draft,
        "current_step": "reviewer"
    }

def reviewer(state: AgentState) -> AgentState:
    """Reviewer node - evaluates draft."""
    review = llm.invoke(f"Review this draft: {state['draft']}")
    return {
        "messages": [f"Review: {review}"],
        "feedback": review,
        "iteration": state.get("iteration", 0) + 1
    }

# Routing function
def should_continue(state: AgentState) -> Literal["writer", "end"]:
    """Decide whether to continue iterating."""
    if state["iteration"] >= 3:
        return "end"
    if "approved" in state["feedback"].lower():
        return "end"
    return "writer"

# Build graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)

# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges(
    "reviewer",
    should_continue,
    {
        "writer": "writer",
        "end": END
    }
)

# Compile
app = workflow.compile()

# Run
result = app.invoke({
    "messages": ["Write a blog post about LLM frameworks"],
    "iteration": 0
})

Understanding the LangGraph workflow:

  • State schema (lines 7-13): AgentState defines what data flows through the graph. Annotated[list, operator.add] means messages accumulates—new messages append to existing ones rather than replacing them. Other fields replace previous values.

  • Node functions (lines 16-42): Each node receives the current state and returns updates. The researcher node adds research data; the writer node uses that data to create a draft. Notice nodes return partial updates—you only specify fields that change.

  • Routing function (lines 45-51): should_continue decides where to go next based on state. It checks iteration count and feedback content. The return value must match an edge name defined in add_conditional_edges.

  • Graph construction (lines 54-72): Build the graph in three steps:

    1. Add nodes (functions)
    2. Set entry point
    3. Add edges (connections between nodes)
  • Conditional edges (lines 65-72): The reviewer can route to "writer" (for revision) or "end" (if approved). This creates the cycle that enables iterative refinement.

  • Compile and run (lines 75-81): .compile() validates the graph and creates a runnable. .invoke() starts execution from the entry point, flowing through nodes until reaching END.

Checkpointing and Memory

Python
from langgraph.checkpoint.sqlite import SqliteSaver

# Add persistence
memory = SqliteSaver.from_conn_string(":memory:")
app = workflow.compile(checkpointer=memory)

# Run with thread_id for persistence
config = {"configurable": {"thread_id": "user-123"}}

result1 = app.invoke({"messages": ["Start task"]}, config)
# Later...
result2 = app.invoke({"messages": ["Continue"]}, config)  # Has previous state

Human-in-the-Loop

Python
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

# Define an interrupt point
workflow.add_node("human_approval", lambda x: x)  # Pass-through

def needs_approval(state: AgentState) -> Literal["approve", "reject", "auto"]:
    """Check if human approval needed."""
    if state.get("high_risk", False):
        return "approve"  # Will interrupt
    return "auto"

workflow.add_conditional_edges(
    "action_node",
    needs_approval,
    {
        "approve": "human_approval",
        "auto": "execute",
        "reject": END
    }
)

# Compile with interrupt
app = workflow.compile(
    checkpointer=memory,
    interrupt_before=["human_approval"]
)

# Run until interrupt
result = app.invoke(initial_state, config)

# Human reviews and continues
app.invoke(None, config)  # Resume from interrupt

LangGraph Strengths

  • State management: Explicit, type-safe state
  • Cyclic workflows: Natural support for loops
  • Persistence: Built-in checkpointing
  • Human-in-the-loop: First-class interrupt support
  • Debugging: Graph visualization, step-through execution

LangGraph Weaknesses

  • Complexity: Steeper learning curve
  • Overhead: More setup for simple flows
  • Documentation: Less mature than LangChain core

Best For

  • Multi-agent orchestration
  • Complex workflows with loops
  • State machines with LLM nodes
  • Human-in-the-loop systems
  • Long-running agent tasks

Other Frameworks

Haystack (deepset)

Production-focused NLP framework:

Python
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
pipeline.add_component("prompt", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))

pipeline.connect("retriever", "prompt.documents")
pipeline.connect("prompt", "llm")

result = pipeline.run({"retriever": {"query": "What is RAG?"}})

Best for: Production search systems, traditional NLP integration

Semantic Kernel (Microsoft)

Enterprise-focused SDK with Azure integration:

Python
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
    deployment_name="gpt-4",
    endpoint="https://xxx.openai.azure.com/"
))

# Create function
@kernel.function(name="summarize")
async def summarize(text: str) -> str:
    """Summarize the given text."""
    return await kernel.invoke_prompt(
        f"Summarize: {text}"
    )

Best for: Azure-heavy environments, .NET shops, enterprise governance

DSPy (Stanford)

Programmatic prompt optimization:

Python
import dspy

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

# Compile with optimizer
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=your_metric)
compiled_rag = optimizer.compile(RAG(), trainset=examples)

Best for: Research, automated prompt tuning, when you have evaluation data

CrewAI

Role-based multi-agent framework:

Python
from crewai import Agent, Task, Crew

researcher = Agent(
    role="Senior Research Analyst",
    goal="Research and analyze market trends",
    backstory="Expert analyst with 20 years experience",
    tools=[search_tool, scrape_tool]
)

writer = Agent(
    role="Content Writer",
    goal="Write compelling content",
    backstory="Award-winning journalist"
)

research_task = Task(
    description="Research AI market trends",
    agent=researcher
)

write_task = Task(
    description="Write report based on research",
    agent=writer
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task]
)

result = crew.kickoff()

Best for: Role-based agent systems, simpler than LangGraph

AutoGen (Microsoft)

Multi-agent conversations:

Python
from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4"}
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="TERMINATE",
    code_execution_config={"work_dir": "coding"}
)

user_proxy.initiate_chat(
    assistant,
    message="Write a Python function to calculate fibonacci numbers"
)

Best for: Conversational agents, code generation with execution

Framework Comparison

Feature Matrix

FeatureLangChainLlamaIndexLangGraphHaystack
RAG QualityGoodExcellentN/AGood
Agent SupportExcellentGoodExcellentBasic
Integrations700+160+Via LangChain100+
Learning CurveMediumLowHighMedium
Production ReadyYesYesYesYes
ObservabilityLangSmithLlamaTraceLangSmithdeepset Cloud
State ManagementBasicBasicExcellentBasic
Multi-AgentGoodBasicExcellentBasic

Performance Comparison

From research: "LlamaIndex generally offers better performance for retrieval tasks due to optimized indexing strategies and query engines."

AspectLangChainLlamaIndex
RAG latency~500ms~350ms
Memory usageHigherLower
Index creationModerateOptimized
Query optimizationManualBuilt-in

Pricing

LangChain:

  • Core library: Free (MIT)
  • LangSmith: Free tier + paid plans ($39+/mo)

LlamaIndex:

  • Core library: Free (MIT)
  • LlamaCloud: Free tier (10k credits) + paid ($50-500+/mo)

Others:

  • Most core libraries are MIT/Apache licensed
  • Managed services vary by provider

Combining Frameworks

LlamaIndex RAG + LangChain Agents

The most common combination:

Python
# LlamaIndex for high-quality RAG
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# LangChain for agent orchestration
from langchain.tools import Tool
from langchain.agents import create_react_agent, AgentExecutor

knowledge_tool = Tool(
    name="knowledge_base",
    func=lambda q: str(query_engine.query(q)),
    description="Search internal knowledge base for information"
)

agent = create_react_agent(llm, [knowledge_tool, other_tools], prompt)
executor = AgentExecutor(agent=agent, tools=[knowledge_tool, other_tools])

result = executor.invoke({"input": "user question"})

LangGraph + LlamaIndex

For complex multi-agent RAG:

Python
from langgraph.graph import StateGraph

# LlamaIndex retrievers as nodes
def retrieval_node(state):
    query = state["query"]
    results = llama_index_retriever.retrieve(query)
    return {"context": results, **state}

def generation_node(state):
    response = llm.invoke(
        f"Context: {state['context']}\nQuestion: {state['query']}"
    )
    return {"response": response, **state}

# Build graph
workflow = StateGraph(RAGState)
workflow.add_node("retrieve", retrieval_node)
workflow.add_node("generate", generation_node)
workflow.add_edge("retrieve", "generate")

Decision Framework

When to Choose LlamaIndex

From research: "LlamaIndex generally has a gentler learning curve. Its high-level API and focus on data connection and querying make it easier to get started."

Choose LlamaIndex when:

  • Building RAG as the primary feature
  • Working with complex documents (tables, images)
  • Need best-in-class retrieval quality
  • Want quick time-to-production
  • Team is new to LLM frameworks

When to Choose LangChain

From research: "LangChain offers significantly more flexibility and control. Its modular architecture allows you to swap out different LLMs, customize prompt templates, and chain together multiple tools and agents."

Choose LangChain when:

  • Building complex agent systems
  • Need maximum integration flexibility
  • Want comprehensive observability (LangSmith)
  • Building multi-step reasoning chains
  • Team has LLM development experience

When to Choose LangGraph

Choose LangGraph when:

  • Building multi-agent systems
  • Need explicit state management
  • Require cyclic workflows (loops, retries)
  • Building human-in-the-loop systems
  • Long-running agent tasks

Decision Tree

Code
What's your primary use case?

├── Document Q&A / RAG
│   ├── Simple → LlamaIndex
│   └── With agents → LlamaIndex + LangChain
│
├── Autonomous agents
│   ├── Simple tools → LangChain
│   ├── Multi-agent → LangGraph
│   └── Role-based teams → CrewAI
│
├── Enterprise / Azure
│   └── Semantic Kernel
│
├── Research / Optimization
│   └── DSPy
│
└── Production search
    └── Haystack

Implementation Best Practices

Start Simple

Python
# Don't start with this:
complex_multi_agent_langgraph_system()

# Start with this:
simple_rag_chain = retriever | prompt | llm | parser

Add Complexity Incrementally

  1. Start with basic RAG or chain
  2. Add error handling
  3. Add observability
  4. Add caching
  5. Add agents if needed
  6. Add multi-agent only when proven necessary

Monitor Everything

Python
# LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# Custom callbacks
from langchain.callbacks import BaseCallbackHandler

class MetricsCallback(BaseCallbackHandler):
    def on_llm_end(self, response, **kwargs):
        log_metrics({
            "tokens": response.llm_output["token_usage"],
            "latency": response.llm_output["latency"]
        })

Use Type Safety

Python
from pydantic import BaseModel
from typing import List, Optional

class QueryResult(BaseModel):
    answer: str
    sources: List[str]
    confidence: float
    follow_up_questions: Optional[List[str]]

# Use with structured output
model.with_structured_output(QueryResult)

Conclusion

The framework choice depends on your primary use case:

  1. RAG-first: Start with LlamaIndex
  2. Agent-first: Start with LangChain
  3. Multi-agent: Use LangGraph
  4. Enterprise/Azure: Use Semantic Kernel
  5. Research: Consider DSPy

Most teams benefit from combining frameworks:

  • LlamaIndex for data ingestion and retrieval
  • LangChain for orchestration and agents
  • LangGraph for complex stateful workflows

Start simple, add complexity when needed, and always instrument for observability.

Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles