Can I use both LangChain and LlamaIndex together?

Yes, this is a common and recommended pattern. Use LlamaIndex for data ingestion and retrieval optimization, LangChain for agent orchestration and general workflows. They integrate seamlessly.

Which has better performance?

LlamaIndex generally has better RAG performance out-of-the-box due to optimized retrieval patterns (sentence window, auto-merging). LangChain offers more flexibility to tune performance. For agents, both are comparable.

Which is easier to learn?

LlamaIndex has a gentler learning curve for RAG use cases. LangChain has more concepts but comprehensive documentation. LangGraph requires understanding state machines and graph-based programming.

Should I use managed services (LlamaCloud, LangSmith)?

For production: LangSmith (observability) is highly recommended—debugging LLM applications without tracing is painful. LlamaCloud (parsing/indexing) is valuable for complex documents. Start with open-source, add managed services as pain points emerge.

What about vendor lock-in?

Both LangChain and LlamaIndex are MIT-licensed open source. You can self-host everything. Managed services (LlamaCloud, LangSmith) have some lock-in but aren't required. The abstractions make switching LLM providers straightforward.

How do I handle errors in production?

Use try/except with specific error types, implement retries with exponential backoff, add fallbacks (e.g., simpler model if main fails), log all failures for debugging. Both frameworks support callbacks for error handling.

LLM Frameworks: LangChain, LlamaIndex, LangGraph, and Beyond | Enrico Piovano

The Framework Landscape in 2025

Building LLM applications from scratch is complex. Frameworks abstract common patterns—RAG, agents, chains—letting you focus on your application logic rather than reinventing infrastructure.

This guide provides a comprehensive comparison of the major frameworks, when to use each, and how to combine them effectively for production systems.

Framework Overview

Framework	Primary Focus	Best For	Company
LangChain	General LLM applications	Complex agents, multi-step workflows	LangChain, Inc
LlamaIndex	Data & RAG	Document Q&A, knowledge bases	LlamaIndex, Inc
LangGraph	Multi-agent systems	Stateful workflows, agent orchestration	LangChain, Inc
Haystack	Search & NLP	Production search, NLP pipelines	deepset
Semantic Kernel	Enterprise AI	Azure integration, .NET support	Microsoft
DSPy	Prompt optimization	Research, automated tuning	Stanford NLP
CrewAI	Multi-agent teams	Role-based agents	CrewAI
AutoGen	Agent conversations	Multi-agent dialogue	Microsoft

LangChain

Overview

LangChain is the most comprehensive LLM application framework, providing building blocks for chains, agents, RAG, and complex workflows.

From research: "LangChain is the most flexible foundation for complex LLM apps and agentic workflows, with excellent production tooling."

Core Architecture

Code

LangChain Ecosystem
├── langchain-core     # Base abstractions
├── langchain          # Main framework
├── langchain-community # Third-party integrations
├── langchain-openai   # OpenAI-specific
├── langchain-anthropic # Anthropic-specific
└── langgraph          # Multi-agent workflows

Core Concepts

Chains (LCEL - LangChain Expression Language)

LCEL (LangChain Expression Language) is LangChain's declarative way to compose operations. The key innovation is the pipe operator (|) which chains components together: data flows from left to right through each component. Think of it like Unix pipes—prompt | model | parser means "take input, format it with the prompt, send to the model, parse the output."

LCEL provides three major benefits: (1) automatic batching and parallel execution, (2) built-in streaming support, and (3) consistent interface across all components. Every LCEL chain supports .invoke() (single input), .batch() (multiple inputs), and .stream() (streaming output) without any extra code.

Python

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnablePassthrough

# Simple chain
prompt = ChatPromptTemplate.from_template(
    "Summarize the following text in {style} style:\n\n{text}"
)
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()

chain = prompt | model | parser

result = chain.invoke({
    "style": "concise bullet points",
    "text": long_document
})

# Chain with parallel operations
from langchain_core.runnables import RunnableParallel

analysis_chain = RunnableParallel({
    "summary": prompt | model | StrOutputParser(),
    "key_points": key_points_prompt | model | JsonOutputParser(),
    "sentiment": sentiment_prompt | model | StrOutputParser(),
})

results = analysis_chain.invoke({"text": document})
# Returns: {"summary": "...", "key_points": [...], "sentiment": "..."}

Understanding this code:

ChatPromptTemplate (line 6-7): Creates a reusable template with variables ({style}, {text}). LangChain handles the formatting—you pass a dict, it fills in the placeholders.
Pipe composition (line 12): prompt | model | parser creates a chain. When you call .invoke(), data flows through: dict → formatted prompt → LLM → parsed string. Each component transforms the output of the previous one.
RunnableParallel (lines 20-25): Runs multiple chains simultaneously with the same input. This is efficient—one network call per model invocation, but all three analyses happen concurrently. Results come back as a dict with keys matching your chain names.
Different parsers: StrOutputParser() extracts raw text, JsonOutputParser() parses JSON into a dict. Match parser to your expected output format.

Structured Output

LLMs naturally output free-form text, but applications need structured data. LangChain's .with_structured_output() forces the model to return data matching a Pydantic schema. Under the hood, it uses the model's function calling or JSON mode to guarantee valid output—no parsing errors, no missing fields.

This is one of LangChain's most valuable features. Instead of hoping the model outputs valid JSON and writing error handling for when it doesn't, you define a schema and let LangChain ensure compliance.

Python

from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

class ExtractedInfo(BaseModel):
    """Information extracted from document."""
    title: str = Field(description="Document title")
    entities: List[str] = Field(description="Named entities mentioned")
    summary: str = Field(description="Brief summary")
    confidence: float = Field(description="Confidence score 0-1")

model = ChatOpenAI(model="gpt-4o")
structured_model = model.with_structured_output(ExtractedInfo)

result = structured_model.invoke("Extract info from: " + document)
# Returns: ExtractedInfo(title="...", entities=[...], ...)

Key points:

Pydantic schema (lines 3-8): The Field(description=...) is crucial—the model reads these descriptions to understand what each field should contain. Good descriptions improve extraction accuracy.
with_structured_output() (line 11): Wraps the model with structured output handling. The returned structured_model can be used anywhere a regular model would be, but always returns your schema type.
Type safety: The result is a proper ExtractedInfo instance, not a dict. You get IDE autocomplete, type checking, and validation automatically.

Agents with Tools

Agents are LLMs that can take actions. Instead of just generating text, an agent can search the web, query databases, or call APIs. The key pattern is ReAct (Reasoning + Acting): the model thinks about what to do, takes an action, observes the result, and repeats until it has an answer.

LangChain agents require three components: (1) tools the agent can use, (2) a prompt that teaches the model how to use tools, and (3) an executor that manages the think-act-observe loop.

Python

from langchain.agents import create_react_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools import Tool
from langchain_core.prompts import PromptTemplate

# Define tools
search = DuckDuckGoSearchRun()

def calculate(expression: str) -> str:
    """Evaluate a mathematical expression."""
    try:
        return str(eval(expression))
    except Exception as e:
        return f"Error: {e}"

tools = [
    Tool(
        name="web_search",
        func=search.run,
        description="Search the web for current information"
    ),
    Tool(
        name="calculator",
        func=calculate,
        description="Evaluate mathematical expressions"
    ),
]

# Create agent
prompt = PromptTemplate.from_template("""
You are a helpful assistant with access to tools.

Tools available:
{tools}

Tool names: {tool_names}

Question: {input}
{agent_scratchpad}
""")

agent = create_react_agent(
    llm=ChatOpenAI(model="gpt-4o"),
    tools=tools,
    prompt=prompt
)

executor = AgentExecutor(
    agent=agent,
    tools=tools,
    verbose=True,
    max_iterations=5,
    handle_parsing_errors=True
)

result = executor.invoke({"input": "What is 15% of the current Bitcoin price?"})

Understanding the agent architecture:

Tool definition (lines 6-23): Each Tool needs a name, function, and description. The description is what the model reads to decide when to use the tool—be specific about what it does and when it's appropriate. Vague descriptions lead to wrong tool choices.
Prompt template (lines 26-36): The prompt teaches the model the ReAct format. {tools} lists tool descriptions, {tool_names} provides valid names, and {agent_scratchpad} is where the model's thinking history goes. This format enables multi-step reasoning.
create_react_agent (lines 38-42): Creates an agent that follows ReAct. The agent parses model output to extract tool calls, but doesn't execute them—that's the executor's job.
AgentExecutor (lines 44-50): The runtime loop that:
1. Passes the question to the agent
2. Parses the response to find tool calls
3. Executes the tool and gets results
4. Feeds results back to the agent
5. Repeats until the agent says "Final Answer"
max_iterations=5: Prevents infinite loops. If the agent can't solve the problem in 5 steps, it gives up. Tune based on task complexity.
handle_parsing_errors=True: If the model outputs malformed tool calls, the executor recovers gracefully instead of crashing.

RAG with LangChain

RAG (Retrieval-Augmented Generation) grounds LLM responses in your data. Instead of relying on the model's training knowledge, you retrieve relevant documents and include them in the prompt. This reduces hallucinations and lets you work with private or recent information.

LangChain's RAG pipeline has four stages: (1) load documents, (2) split into chunks, (3) embed and store in a vector database, (4) retrieve and generate answers. Each stage is customizable—you can swap vector stores, change chunking strategies, or modify retrieval logic.

Python

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate

# Load and process documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
    separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    persist_directory="./chroma_db"
)

# Create retriever
retriever = vectorstore.as_retriever(
    search_type="similarity",
    search_kwargs={"k": 5}
)

# RAG chain with custom prompt
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following context. If the context doesn't
contain enough information, say so.

Context: {context}

Question: {question}

Answer:""")

def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | rag_prompt
    | ChatOpenAI(model="gpt-4o")
    | StrOutputParser()
)

answer = rag_chain.invoke("What are the key findings?")

RAG pipeline explained:

Document loading (lines 8-9): PyPDFLoader extracts text from PDFs. LangChain has loaders for 100+ formats—use the one that matches your data. Loaders return Document objects with text content and metadata.
Text splitting (lines 11-15): Documents are too long to embed whole. RecursiveCharacterTextSplitter breaks them into chunks, trying to preserve semantic boundaries. chunk_overlap=200 means adjacent chunks share 200 characters—this helps with questions that span chunk boundaries.
Embedding (lines 18-23): Each chunk is converted to a vector using the embedding model. text-embedding-3-small is OpenAI's efficient option. Vectors are stored in Chroma (a local vector database). persist_directory saves to disk so you don't re-embed on restart.
Retriever (lines 26-29): Wraps the vector store with a search interface. similarity_top_k=5 returns the 5 most relevant chunks for each query. The retriever can be swapped for hybrid search, re-ranking, or other strategies.
LCEL RAG chain (lines 42-47): The modern approach. retriever | format_docs fetches and formats context, RunnablePassthrough() passes the question unchanged. Both feed into the prompt, then the model generates an answer.

The LangChain Ecosystem

LangSmith (Observability)

Python

# Set environment variables
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-project"

# All chain/agent executions are now traced
# View at smith.langchain.com

LangSmith features:

Trace visualization
Latency analysis
Token usage tracking
Dataset management
Evaluation runs
Prompt versioning

LangServe (Deployment)

Python

from fastapi import FastAPI
from langserve import add_routes

app = FastAPI()

# Add chain as REST endpoint
add_routes(
    app,
    rag_chain,
    path="/rag",
)

# Run with: uvicorn main:app --reload
# API available at: http://localhost:8000/rag/invoke

LangChain Strengths

From IBM: "LangChain shines in projects that demand complex reasoning chains and multi-step AI workflows. If your goal is to create autonomous agents capable of making decisions, interacting with multiple APIs, and managing dynamic conversational flows, LangChain provides the flexibility needed."

Key strengths:

Flexibility: Swap components easily
Integrations: 700+ integrations
Observability: LangSmith for production
Community: Large ecosystem, many examples
Agent support: Best-in-class agent tooling

LangChain Weaknesses

Complexity: Learning curve can be steep
Abstraction overhead: Sometimes too many layers
Breaking changes: Rapid iteration means API changes
Memory usage: Can be heavy for simple use cases

Best For

Complex agent workflows
Multi-step reasoning chains
Applications needing many integrations
Teams wanting comprehensive tooling
Production systems with observability needs

LlamaIndex

Overview

LlamaIndex specializes in connecting LLMs to your data with optimized RAG pipelines and data ingestion.

From research: "LlamaIndex is the fastest route to high-quality, production-grade RAG on your data."

Core Architecture

Code

LlamaIndex Ecosystem
├── llama-index-core        # Core abstractions
├── llama-index-llms-*      # LLM integrations
├── llama-index-embeddings-* # Embedding models
├── llama-index-readers-*   # Data connectors
├── llama-index-vector-stores-* # Vector DBs
└── llama-cloud             # Managed services

Core Concepts

Data Connectors (Readers)

LlamaIndex excels at data ingestion. Readers connect to data sources and produce Document objects that can be indexed. The library includes readers for files, databases, APIs, cloud storage, and more. Each reader handles the complexity of its data source—parsing PDFs, handling authentication, pagination—so you focus on what to do with the data.

The pattern is consistent across all readers: create a reader with configuration, call .load_data(), get documents. This uniformity makes it easy to support multiple data sources in a single application.

Python

from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.database import DatabaseReader
from llama_index.readers.notion import NotionPageReader

# Local files (PDF, DOCX, TXT, etc.)
documents = SimpleDirectoryReader(
    input_dir="./data",
    recursive=True,
    required_exts=[".pdf", ".docx", ".txt"]
).load_data()

# Web pages
web_docs = SimpleWebPageReader(
    html_to_text=True
).load_data([
    "https://docs.example.com/guide",
    "https://docs.example.com/api"
])

# Database
db_reader = DatabaseReader(
    uri="postgresql://user:pass@localhost/db"
)
db_docs = db_reader.load_data(
    query="SELECT title, content FROM articles WHERE published = true"
)

# Notion
notion_docs = NotionPageReader(
    integration_token="secret_xxx"
).load_data(page_ids=["page-id-1", "page-id-2"])

Reader options explained:

SimpleDirectoryReader: The workhorse for local files. recursive=True processes subdirectories, required_exts filters by file type. It auto-detects parsers for PDF, DOCX, TXT, and more.
SimpleWebPageReader: Fetches and parses web pages. html_to_text=True strips HTML tags for cleaner text. For dynamic sites, consider using a headless browser reader instead.
DatabaseReader: Executes SQL and converts rows to documents. Each row becomes a document—customize the query to get the right granularity.
NotionPageReader: Connects to Notion's API. Requires an integration token from your Notion workspace. Can fetch entire databases, not just individual pages.

Indexes and Query Engines

An index is how LlamaIndex organizes your documents for efficient retrieval. The most common is VectorStoreIndex, which embeds documents and enables semantic search. Once you have an index, you create a query engine to search it and generate answers.

LlamaIndex separates retrieval (finding relevant documents) from synthesis (generating answers). This separation lets you customize each stage independently—use one retrieval strategy for speed, another for accuracy.

Python

from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding

# Configure global settings
Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Create index
index = VectorStoreIndex.from_documents(
    documents,
    show_progress=True
)

# Simple query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response.response)
print(response.source_nodes)  # Retrieved chunks

# Advanced query engine
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.postprocessor import SimilarityPostprocessor

retriever = VectorIndexRetriever(
    index=index,
    similarity_top_k=10,
)

query_engine = RetrieverQueryEngine(
    retriever=retriever,
    node_postprocessors=[
        SimilarityPostprocessor(similarity_cutoff=0.7)
    ]
)

Index and query engine concepts:

Settings (lines 6-7): Global configuration for LLM and embedding model. All indexes and query engines use these defaults unless overridden. Set temperature=0.1 for factual queries where consistency matters.
VectorStoreIndex (lines 10-13): Creates an index from documents. show_progress=True displays a progress bar during embedding—useful for large document sets.
Simple query engine (lines 16-19): .as_query_engine() creates a default query engine. It retrieves relevant chunks and synthesizes an answer. response.source_nodes shows which chunks were used—essential for debugging and citation.
Advanced query engine (lines 22-35): For more control, build components separately. VectorIndexRetriever handles search, SimilarityPostprocessor filters low-quality matches (below 0.7 similarity). Stack multiple postprocessors for re-ranking, deduplication, or metadata filtering.

Advanced RAG Patterns

Basic RAG retrieves chunks based on semantic similarity, but this misses cases where exact keyword matches matter or where the relevant information spans multiple chunks. LlamaIndex provides advanced patterns that significantly improve retrieval quality:

Hybrid search combines semantic and keyword matching—catch both conceptually similar and exact term matches
Sentence window retrieval returns the sentences around matches for better context
Auto-merging retrieval uses hierarchical chunks, merging small chunks into larger ones when multiple small chunks from the same section are retrieved

Python

# Hybrid search (semantic + keyword)
from llama_index.core.retrievers import QueryFusionRetriever

retriever = QueryFusionRetriever(
    [
        index.as_retriever(similarity_top_k=5),
        # BM25 retriever for keyword matching
        bm25_retriever,
    ],
    similarity_top_k=10,
    num_queries=4,  # Generate query variations
    mode="reciprocal_rerank",
)

# Sentence window retrieval
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor

node_parser = SentenceWindowNodeParser.from_defaults(
    window_size=3,
    window_metadata_key="window",
    original_text_metadata_key="original_text",
)

# Auto-merging retrieval (hierarchical chunks)
from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core.postprocessor import AutoMergingRetriever

node_parser = HierarchicalNodeParser.from_defaults(
    chunk_sizes=[2048, 512, 128]
)

Agents in LlamaIndex

Python

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool

# Create tools from query engines
query_tool = QueryEngineTool.from_defaults(
    query_engine=index.as_query_engine(),
    name="knowledge_base",
    description="Search the company knowledge base"
)

# Custom function tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a recipient."""
    # Implementation
    return f"Email sent to {to}"

email_tool = FunctionTool.from_defaults(fn=send_email)

# Create agent
agent = ReActAgent.from_tools(
    tools=[query_tool, email_tool],
    llm=OpenAI(model="gpt-4o"),
    verbose=True,
    max_iterations=10
)

response = agent.chat("Find info about our refund policy and email it to customer@example.com")

The LlamaIndex Ecosystem

LlamaCloud (Managed Services)

Python

from llama_cloud import LlamaCloudIndex

# Use managed parsing and indexing
index = LlamaCloudIndex.from_documents(
    documents,
    name="my-index",
    project_name="my-project"
)

# Query through cloud
response = index.as_query_engine().query("What are the key points?")

LlamaCloud features:

LlamaParse: Advanced document parsing
Managed vector storage
Automatic chunking optimization
API-based access

LlamaParse (Document Parsing)

Python

from llama_parse import LlamaParse

parser = LlamaParse(
    api_key="llx-xxx",
    result_type="markdown",
    num_workers=4,
    verbose=True,
    language="en"
)

# Parse complex documents (tables, images, etc.)
documents = parser.load_data("complex_report.pdf")

LlamaIndex Strengths

From research: "Since LlamaIndex was designed with RAG-heavy workflows in mind, it has a best-in-class data ingestion toolset. The framework helps engineering teams clean and structure messy data before it hits the retriever."

Key strengths:

RAG quality: Best-in-class retrieval patterns
Data connectors: 160+ data sources
Document parsing: LlamaParse handles complex docs
Learning curve: Gentler than LangChain for RAG
Optimized patterns: Sentence window, auto-merging built-in

LlamaIndex Weaknesses

Agent support: Less mature than LangChain
General workflows: Less flexible for non-RAG use cases
Ecosystem: Smaller than LangChain
Observability: LlamaTrace less feature-rich than LangSmith

Best For

RAG-first applications
Document Q&A systems
Knowledge bases
Complex document types (tables, images)
Quick prototyping to production RAG

LangGraph

Overview

LangGraph enables complex multi-agent systems with explicit state management and cyclic workflows.

From research: "LangGraph is a stateful framework for building multi-agent systems as graphs, created by the LangChain team. Engineers model workflows using nodes (tools, functions, LLMs, subgraphs) and edges (loops, conditional routes)."

Core Concepts

LangGraph models workflows as directed graphs where nodes are functions (often calling LLMs) and edges define the flow between them. The key innovation is state: a typed dictionary that flows through the graph, accumulating results from each node. Unlike simple chains, LangGraph supports cycles—a reviewer can send work back to a writer, creating iterative refinement loops.

Understanding state is crucial: every node receives the current state and returns updates. The graph merges these updates (using Annotated operators) and passes the new state to the next node.

State and Graph Definition

Let's build a research-write-review workflow that iterates until the reviewer approves or max iterations are reached:

Python

from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator

# Define state schema
class AgentState(TypedDict):
    messages: Annotated[list, operator.add]
    current_step: str
    research_data: str
    draft: str
    feedback: str
    iteration: int

# Define node functions
def researcher(state: AgentState) -> AgentState:
    """Research node - gathers information."""
    # Call LLM to research
    research = llm.invoke(f"Research: {state['messages'][-1]}")
    return {
        "messages": [f"Research complete: {research}"],
        "research_data": research,
        "current_step": "writer"
    }

def writer(state: AgentState) -> AgentState:
    """Writer node - creates draft."""
    draft = llm.invoke(
        f"Write based on: {state['research_data']}\n"
        f"Previous feedback: {state.get('feedback', 'None')}"
    )
    return {
        "messages": [f"Draft created"],
        "draft": draft,
        "current_step": "reviewer"
    }

def reviewer(state: AgentState) -> AgentState:
    """Reviewer node - evaluates draft."""
    review = llm.invoke(f"Review this draft: {state['draft']}")
    return {
        "messages": [f"Review: {review}"],
        "feedback": review,
        "iteration": state.get("iteration", 0) + 1
    }

# Routing function
def should_continue(state: AgentState) -> Literal["writer", "end"]:
    """Decide whether to continue iterating."""
    if state["iteration"] >= 3:
        return "end"
    if "approved" in state["feedback"].lower():
        return "end"
    return "writer"

# Build graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)

# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges(
    "reviewer",
    should_continue,
    {
        "writer": "writer",
        "end": END
    }
)

# Compile
app = workflow.compile()

# Run
result = app.invoke({
    "messages": ["Write a blog post about LLM frameworks"],
    "iteration": 0
})

Understanding the LangGraph workflow:

State schema (lines 7-13): AgentState defines what data flows through the graph. Annotated[list, operator.add] means messages accumulates—new messages append to existing ones rather than replacing them. Other fields replace previous values.
Node functions (lines 16-42): Each node receives the current state and returns updates. The researcher node adds research data; the writer node uses that data to create a draft. Notice nodes return partial updates—you only specify fields that change.
Routing function (lines 45-51): should_continue decides where to go next based on state. It checks iteration count and feedback content. The return value must match an edge name defined in add_conditional_edges.
Graph construction (lines 54-72): Build the graph in three steps:
1. Add nodes (functions)
2. Set entry point
3. Add edges (connections between nodes)
Conditional edges (lines 65-72): The reviewer can route to "writer" (for revision) or "end" (if approved). This creates the cycle that enables iterative refinement.
Compile and run (lines 75-81): .compile() validates the graph and creates a runnable. .invoke() starts execution from the entry point, flowing through nodes until reaching END.

Checkpointing and Memory

Python

from langgraph.checkpoint.sqlite import SqliteSaver

# Add persistence
memory = SqliteSaver.from_conn_string(":memory:")
app = workflow.compile(checkpointer=memory)

# Run with thread_id for persistence
config = {"configurable": {"thread_id": "user-123"}}

result1 = app.invoke({"messages": ["Start task"]}, config)
# Later...
result2 = app.invoke({"messages": ["Continue"]}, config)  # Has previous state

Human-in-the-Loop

Python

from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode, tools_condition

# Define an interrupt point
workflow.add_node("human_approval", lambda x: x)  # Pass-through

def needs_approval(state: AgentState) -> Literal["approve", "reject", "auto"]:
    """Check if human approval needed."""
    if state.get("high_risk", False):
        return "approve"  # Will interrupt
    return "auto"

workflow.add_conditional_edges(
    "action_node",
    needs_approval,
    {
        "approve": "human_approval",
        "auto": "execute",
        "reject": END
    }
)

# Compile with interrupt
app = workflow.compile(
    checkpointer=memory,
    interrupt_before=["human_approval"]
)

# Run until interrupt
result = app.invoke(initial_state, config)

# Human reviews and continues
app.invoke(None, config)  # Resume from interrupt

LangGraph Strengths

State management: Explicit, type-safe state
Cyclic workflows: Natural support for loops
Persistence: Built-in checkpointing
Human-in-the-loop: First-class interrupt support
Debugging: Graph visualization, step-through execution

LangGraph Weaknesses

Complexity: Steeper learning curve
Overhead: More setup for simple flows
Documentation: Less mature than LangChain core

Best For

Multi-agent orchestration
Complex workflows with loops
State machines with LLM nodes
Human-in-the-loop systems
Long-running agent tasks

Other Frameworks

Haystack (deepset)

Production-focused NLP framework:

Python

from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder

# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
pipeline.add_component("prompt", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))

pipeline.connect("retriever", "prompt.documents")
pipeline.connect("prompt", "llm")

result = pipeline.run({"retriever": {"query": "What is RAG?"}})

Best for: Production search systems, traditional NLP integration

Semantic Kernel (Microsoft)

Enterprise-focused SDK with Azure integration:

Python

import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion

kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
    deployment_name="gpt-4",
    endpoint="https://xxx.openai.azure.com/"
))

# Create function
@kernel.function(name="summarize")
async def summarize(text: str) -> str:
    """Summarize the given text."""
    return await kernel.invoke_prompt(
        f"Summarize: {text}"
    )

Best for: Azure-heavy environments, .NET shops, enterprise governance

DSPy (Stanford)

Programmatic prompt optimization:

Python

import dspy

class RAG(dspy.Module):
    def __init__(self, num_passages=3):
        self.retrieve = dspy.Retrieve(k=num_passages)
        self.generate = dspy.ChainOfThought("context, question -> answer")

    def forward(self, question):
        context = self.retrieve(question).passages
        return self.generate(context=context, question=question)

# Compile with optimizer
from dspy.teleprompt import BootstrapFewShot

optimizer = BootstrapFewShot(metric=your_metric)
compiled_rag = optimizer.compile(RAG(), trainset=examples)

Best for: Research, automated prompt tuning, when you have evaluation data

CrewAI

Role-based multi-agent framework:

Python

from crewai import Agent, Task, Crew

researcher = Agent(
    role="Senior Research Analyst",
    goal="Research and analyze market trends",
    backstory="Expert analyst with 20 years experience",
    tools=[search_tool, scrape_tool]
)

writer = Agent(
    role="Content Writer",
    goal="Write compelling content",
    backstory="Award-winning journalist"
)

research_task = Task(
    description="Research AI market trends",
    agent=researcher
)

write_task = Task(
    description="Write report based on research",
    agent=writer
)

crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, write_task]
)

result = crew.kickoff()

Best for: Role-based agent systems, simpler than LangGraph

AutoGen (Microsoft)

Multi-agent conversations:

Python

from autogen import AssistantAgent, UserProxyAgent

assistant = AssistantAgent(
    name="assistant",
    llm_config={"model": "gpt-4"}
)

user_proxy = UserProxyAgent(
    name="user_proxy",
    human_input_mode="TERMINATE",
    code_execution_config={"work_dir": "coding"}
)

user_proxy.initiate_chat(
    assistant,
    message="Write a Python function to calculate fibonacci numbers"
)

Best for: Conversational agents, code generation with execution

Framework Comparison

Feature Matrix

Feature	LangChain	LlamaIndex	LangGraph	Haystack
RAG Quality	Good	Excellent	N/A	Good
Agent Support	Excellent	Good	Excellent	Basic
Integrations	700+	160+	Via LangChain	100+
Learning Curve	Medium	Low	High	Medium
Production Ready	Yes	Yes	Yes	Yes
Observability	LangSmith	LlamaTrace	LangSmith	deepset Cloud
State Management	Basic	Basic	Excellent	Basic
Multi-Agent	Good	Basic	Excellent	Basic

Performance Comparison

From research: "LlamaIndex generally offers better performance for retrieval tasks due to optimized indexing strategies and query engines."

Aspect	LangChain	LlamaIndex
RAG latency	~500ms	~350ms
Memory usage	Higher	Lower
Index creation	Moderate	Optimized
Query optimization	Manual	Built-in

Pricing

LangChain:

Core library: Free (MIT)
LangSmith: Free tier + paid plans ($39+/mo)

LlamaIndex:

Core library: Free (MIT)
LlamaCloud: Free tier (10k credits) + paid ($50-500+/mo)

Others:

Most core libraries are MIT/Apache licensed
Managed services vary by provider

Combining Frameworks

LlamaIndex RAG + LangChain Agents

The most common combination:

Python

# LlamaIndex for high-quality RAG
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()

# LangChain for agent orchestration
from langchain.tools import Tool
from langchain.agents import create_react_agent, AgentExecutor

knowledge_tool = Tool(
    name="knowledge_base",
    func=lambda q: str(query_engine.query(q)),
    description="Search internal knowledge base for information"
)

agent = create_react_agent(llm, [knowledge_tool, other_tools], prompt)
executor = AgentExecutor(agent=agent, tools=[knowledge_tool, other_tools])

result = executor.invoke({"input": "user question"})

LangGraph + LlamaIndex

For complex multi-agent RAG:

Python

from langgraph.graph import StateGraph

# LlamaIndex retrievers as nodes
def retrieval_node(state):
    query = state["query"]
    results = llama_index_retriever.retrieve(query)
    return {"context": results, **state}

def generation_node(state):
    response = llm.invoke(
        f"Context: {state['context']}\nQuestion: {state['query']}"
    )
    return {"response": response, **state}

# Build graph
workflow = StateGraph(RAGState)
workflow.add_node("retrieve", retrieval_node)
workflow.add_node("generate", generation_node)
workflow.add_edge("retrieve", "generate")

Decision Framework

When to Choose LlamaIndex

From research: "LlamaIndex generally has a gentler learning curve. Its high-level API and focus on data connection and querying make it easier to get started."

Choose LlamaIndex when:

Building RAG as the primary feature
Working with complex documents (tables, images)
Need best-in-class retrieval quality
Want quick time-to-production
Team is new to LLM frameworks

When to Choose LangChain

From research: "LangChain offers significantly more flexibility and control. Its modular architecture allows you to swap out different LLMs, customize prompt templates, and chain together multiple tools and agents."

Choose LangChain when:

Building complex agent systems
Need maximum integration flexibility
Want comprehensive observability (LangSmith)
Building multi-step reasoning chains
Team has LLM development experience

When to Choose LangGraph

Choose LangGraph when:

Building multi-agent systems
Need explicit state management
Require cyclic workflows (loops, retries)
Building human-in-the-loop systems
Long-running agent tasks

Decision Tree

Code

What's your primary use case?

├── Document Q&A / RAG
│   ├── Simple → LlamaIndex
│   └── With agents → LlamaIndex + LangChain
│
├── Autonomous agents
│   ├── Simple tools → LangChain
│   ├── Multi-agent → LangGraph
│   └── Role-based teams → CrewAI
│
├── Enterprise / Azure
│   └── Semantic Kernel
│
├── Research / Optimization
│   └── DSPy
│
└── Production search
    └── Haystack

Implementation Best Practices

Start Simple

Python

# Don't start with this:
complex_multi_agent_langgraph_system()

# Start with this:
simple_rag_chain = retriever | prompt | llm | parser

Add Complexity Incrementally

Start with basic RAG or chain
Add error handling
Add observability
Add caching
Add agents if needed
Add multi-agent only when proven necessary

Monitor Everything

Python

# LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# Custom callbacks
from langchain.callbacks import BaseCallbackHandler

class MetricsCallback(BaseCallbackHandler):
    def on_llm_end(self, response, **kwargs):
        log_metrics({
            "tokens": response.llm_output["token_usage"],
            "latency": response.llm_output["latency"]
        })

Use Type Safety

Python

from pydantic import BaseModel
from typing import List, Optional

class QueryResult(BaseModel):
    answer: str
    sources: List[str]
    confidence: float
    follow_up_questions: Optional[List[str]]

# Use with structured output
model.with_structured_output(QueryResult)

Conclusion

The framework choice depends on your primary use case:

RAG-first: Start with LlamaIndex
Agent-first: Start with LangChain
Multi-agent: Use LangGraph
Enterprise/Azure: Use Semantic Kernel
Research: Consider DSPy

Most teams benefit from combining frameworks:

LlamaIndex for data ingestion and retrieval
LangChain for orchestration and agents
LangGraph for complex stateful workflows

Start simple, add complexity when needed, and always instrument for observability.

Table of Contents

The Framework Landscape in 2025

Framework Overview

LangChain

Overview

Core Architecture

Core Concepts

Chains (LCEL - LangChain Expression Language)

Structured Output

Agents with Tools

RAG with LangChain

The LangChain Ecosystem

LangSmith (Observability)

LangServe (Deployment)

LangChain Strengths

LangChain Weaknesses

Best For

LlamaIndex

Overview

Core Architecture

Core Concepts

Data Connectors (Readers)

Indexes and Query Engines

Advanced RAG Patterns

Agents in LlamaIndex

The LlamaIndex Ecosystem

LlamaCloud (Managed Services)

LlamaParse (Document Parsing)

LlamaIndex Strengths

LlamaIndex Weaknesses

Best For

LangGraph

Overview

Core Concepts

State and Graph Definition

Checkpointing and Memory

Human-in-the-Loop

LangGraph Strengths

LangGraph Weaknesses

Best For

Other Frameworks

Haystack (deepset)

Semantic Kernel (Microsoft)

DSPy (Stanford)

CrewAI

AutoGen (Microsoft)

Framework Comparison

Feature Matrix

Performance Comparison

Pricing

Combining Frameworks

LlamaIndex RAG + LangChain Agents

LangGraph + LlamaIndex

Decision Framework

When to Choose LlamaIndex

When to Choose LangChain

When to Choose LangGraph

Decision Tree

Implementation Best Practices

Start Simple

Add Complexity Incrementally

Monitor Everything

Use Type Safety

Conclusion

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Building Agentic AI Systems: A Complete Implementation Guide

Building Production-Ready RAG Systems: Lessons from the Field

AI Coding Assistants 2025: Cursor vs Copilot vs Windsurf vs Claude Code