LLM Frameworks: LangChain, LlamaIndex, LangGraph, and Beyond
A comprehensive comparison of LLM application frameworks—LangChain, LlamaIndex, LangGraph, Haystack, and alternatives. When to use each, how to combine them, and practical implementation patterns.
Table of Contents
The Framework Landscape in 2025
Building LLM applications from scratch is complex. Frameworks abstract common patterns—RAG, agents, chains—letting you focus on your application logic rather than reinventing infrastructure.
This guide provides a comprehensive comparison of the major frameworks, when to use each, and how to combine them effectively for production systems.
Framework Overview
| Framework | Primary Focus | Best For | Company |
|---|---|---|---|
| LangChain | General LLM applications | Complex agents, multi-step workflows | LangChain, Inc |
| LlamaIndex | Data & RAG | Document Q&A, knowledge bases | LlamaIndex, Inc |
| LangGraph | Multi-agent systems | Stateful workflows, agent orchestration | LangChain, Inc |
| Haystack | Search & NLP | Production search, NLP pipelines | deepset |
| Semantic Kernel | Enterprise AI | Azure integration, .NET support | Microsoft |
| DSPy | Prompt optimization | Research, automated tuning | Stanford NLP |
| CrewAI | Multi-agent teams | Role-based agents | CrewAI |
| AutoGen | Agent conversations | Multi-agent dialogue | Microsoft |
LangChain
Overview
LangChain is the most comprehensive LLM application framework, providing building blocks for chains, agents, RAG, and complex workflows.
From research: "LangChain is the most flexible foundation for complex LLM apps and agentic workflows, with excellent production tooling."
Core Architecture
LangChain Ecosystem
├── langchain-core # Base abstractions
├── langchain # Main framework
├── langchain-community # Third-party integrations
├── langchain-openai # OpenAI-specific
├── langchain-anthropic # Anthropic-specific
└── langgraph # Multi-agent workflows
Core Concepts
Chains (LCEL - LangChain Expression Language)
LCEL (LangChain Expression Language) is LangChain's declarative way to compose operations. The key innovation is the pipe operator (|) which chains components together: data flows from left to right through each component. Think of it like Unix pipes—prompt | model | parser means "take input, format it with the prompt, send to the model, parse the output."
LCEL provides three major benefits: (1) automatic batching and parallel execution, (2) built-in streaming support, and (3) consistent interface across all components. Every LCEL chain supports .invoke() (single input), .batch() (multiple inputs), and .stream() (streaming output) without any extra code.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser, JsonOutputParser
from langchain_core.runnables import RunnablePassthrough
# Simple chain
prompt = ChatPromptTemplate.from_template(
"Summarize the following text in {style} style:\n\n{text}"
)
model = ChatOpenAI(model="gpt-4o")
parser = StrOutputParser()
chain = prompt | model | parser
result = chain.invoke({
"style": "concise bullet points",
"text": long_document
})
# Chain with parallel operations
from langchain_core.runnables import RunnableParallel
analysis_chain = RunnableParallel({
"summary": prompt | model | StrOutputParser(),
"key_points": key_points_prompt | model | JsonOutputParser(),
"sentiment": sentiment_prompt | model | StrOutputParser(),
})
results = analysis_chain.invoke({"text": document})
# Returns: {"summary": "...", "key_points": [...], "sentiment": "..."}
Understanding this code:
-
ChatPromptTemplate (line 6-7): Creates a reusable template with variables (
{style},{text}). LangChain handles the formatting—you pass a dict, it fills in the placeholders. -
Pipe composition (line 12):
prompt | model | parsercreates a chain. When you call.invoke(), data flows through: dict → formatted prompt → LLM → parsed string. Each component transforms the output of the previous one. -
RunnableParallel (lines 20-25): Runs multiple chains simultaneously with the same input. This is efficient—one network call per model invocation, but all three analyses happen concurrently. Results come back as a dict with keys matching your chain names.
-
Different parsers:
StrOutputParser()extracts raw text,JsonOutputParser()parses JSON into a dict. Match parser to your expected output format.
Structured Output
LLMs naturally output free-form text, but applications need structured data. LangChain's .with_structured_output() forces the model to return data matching a Pydantic schema. Under the hood, it uses the model's function calling or JSON mode to guarantee valid output—no parsing errors, no missing fields.
This is one of LangChain's most valuable features. Instead of hoping the model outputs valid JSON and writing error handling for when it doesn't, you define a schema and let LangChain ensure compliance.
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List
class ExtractedInfo(BaseModel):
"""Information extracted from document."""
title: str = Field(description="Document title")
entities: List[str] = Field(description="Named entities mentioned")
summary: str = Field(description="Brief summary")
confidence: float = Field(description="Confidence score 0-1")
model = ChatOpenAI(model="gpt-4o")
structured_model = model.with_structured_output(ExtractedInfo)
result = structured_model.invoke("Extract info from: " + document)
# Returns: ExtractedInfo(title="...", entities=[...], ...)
Key points:
-
Pydantic schema (lines 3-8): The
Field(description=...)is crucial—the model reads these descriptions to understand what each field should contain. Good descriptions improve extraction accuracy. -
with_structured_output() (line 11): Wraps the model with structured output handling. The returned
structured_modelcan be used anywhere a regular model would be, but always returns your schema type. -
Type safety: The result is a proper
ExtractedInfoinstance, not a dict. You get IDE autocomplete, type checking, and validation automatically.
Agents with Tools
Agents are LLMs that can take actions. Instead of just generating text, an agent can search the web, query databases, or call APIs. The key pattern is ReAct (Reasoning + Acting): the model thinks about what to do, takes an action, observes the result, and repeats until it has an answer.
LangChain agents require three components: (1) tools the agent can use, (2) a prompt that teaches the model how to use tools, and (3) an executor that manages the think-act-observe loop.
from langchain.agents import create_react_agent, AgentExecutor
from langchain_community.tools import DuckDuckGoSearchRun
from langchain.tools import Tool
from langchain_core.prompts import PromptTemplate
# Define tools
search = DuckDuckGoSearchRun()
def calculate(expression: str) -> str:
"""Evaluate a mathematical expression."""
try:
return str(eval(expression))
except Exception as e:
return f"Error: {e}"
tools = [
Tool(
name="web_search",
func=search.run,
description="Search the web for current information"
),
Tool(
name="calculator",
func=calculate,
description="Evaluate mathematical expressions"
),
]
# Create agent
prompt = PromptTemplate.from_template("""
You are a helpful assistant with access to tools.
Tools available:
{tools}
Tool names: {tool_names}
Question: {input}
{agent_scratchpad}
""")
agent = create_react_agent(
llm=ChatOpenAI(model="gpt-4o"),
tools=tools,
prompt=prompt
)
executor = AgentExecutor(
agent=agent,
tools=tools,
verbose=True,
max_iterations=5,
handle_parsing_errors=True
)
result = executor.invoke({"input": "What is 15% of the current Bitcoin price?"})
Understanding the agent architecture:
-
Tool definition (lines 6-23): Each
Toolneeds a name, function, and description. The description is what the model reads to decide when to use the tool—be specific about what it does and when it's appropriate. Vague descriptions lead to wrong tool choices. -
Prompt template (lines 26-36): The prompt teaches the model the ReAct format.
{tools}lists tool descriptions,{tool_names}provides valid names, and{agent_scratchpad}is where the model's thinking history goes. This format enables multi-step reasoning. -
create_react_agent (lines 38-42): Creates an agent that follows ReAct. The agent parses model output to extract tool calls, but doesn't execute them—that's the executor's job.
-
AgentExecutor (lines 44-50): The runtime loop that:
- Passes the question to the agent
- Parses the response to find tool calls
- Executes the tool and gets results
- Feeds results back to the agent
- Repeats until the agent says "Final Answer"
-
max_iterations=5: Prevents infinite loops. If the agent can't solve the problem in 5 steps, it gives up. Tune based on task complexity.
-
handle_parsing_errors=True: If the model outputs malformed tool calls, the executor recovers gracefully instead of crashing.
RAG with LangChain
RAG (Retrieval-Augmented Generation) grounds LLM responses in your data. Instead of relying on the model's training knowledge, you retrieve relevant documents and include them in the prompt. This reduces hallucinations and lets you work with private or recent information.
LangChain's RAG pipeline has four stages: (1) load documents, (2) split into chunks, (3) embed and store in a vector database, (4) retrieve and generate answers. Each stage is customizable—you can swap vector stores, change chunking strategies, or modify retrieval logic.
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains import RetrievalQA
from langchain_core.prompts import ChatPromptTemplate
# Load and process documents
loader = PyPDFLoader("document.pdf")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
splits = text_splitter.split_documents(documents)
# Create vector store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings,
persist_directory="./chroma_db"
)
# Create retriever
retriever = vectorstore.as_retriever(
search_type="similarity",
search_kwargs={"k": 5}
)
# RAG chain with custom prompt
rag_prompt = ChatPromptTemplate.from_template("""
Answer the question based on the following context. If the context doesn't
contain enough information, say so.
Context: {context}
Question: {question}
Answer:""")
def format_docs(docs):
return "\n\n".join(doc.page_content for doc in docs)
rag_chain = (
{"context": retriever | format_docs, "question": RunnablePassthrough()}
| rag_prompt
| ChatOpenAI(model="gpt-4o")
| StrOutputParser()
)
answer = rag_chain.invoke("What are the key findings?")
RAG pipeline explained:
-
Document loading (lines 8-9):
PyPDFLoaderextracts text from PDFs. LangChain has loaders for 100+ formats—use the one that matches your data. Loaders returnDocumentobjects with text content and metadata. -
Text splitting (lines 11-15): Documents are too long to embed whole.
RecursiveCharacterTextSplitterbreaks them into chunks, trying to preserve semantic boundaries.chunk_overlap=200means adjacent chunks share 200 characters—this helps with questions that span chunk boundaries. -
Embedding (lines 18-23): Each chunk is converted to a vector using the embedding model.
text-embedding-3-smallis OpenAI's efficient option. Vectors are stored in Chroma (a local vector database).persist_directorysaves to disk so you don't re-embed on restart. -
Retriever (lines 26-29): Wraps the vector store with a search interface.
similarity_top_k=5returns the 5 most relevant chunks for each query. The retriever can be swapped for hybrid search, re-ranking, or other strategies. -
LCEL RAG chain (lines 42-47): The modern approach.
retriever | format_docsfetches and formats context,RunnablePassthrough()passes the question unchanged. Both feed into the prompt, then the model generates an answer.
The LangChain Ecosystem
LangSmith (Observability)
# Set environment variables
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = "your-api-key"
os.environ["LANGCHAIN_PROJECT"] = "my-project"
# All chain/agent executions are now traced
# View at smith.langchain.com
LangSmith features:
- Trace visualization
- Latency analysis
- Token usage tracking
- Dataset management
- Evaluation runs
- Prompt versioning
LangServe (Deployment)
from fastapi import FastAPI
from langserve import add_routes
app = FastAPI()
# Add chain as REST endpoint
add_routes(
app,
rag_chain,
path="/rag",
)
# Run with: uvicorn main:app --reload
# API available at: http://localhost:8000/rag/invoke
LangChain Strengths
From IBM: "LangChain shines in projects that demand complex reasoning chains and multi-step AI workflows. If your goal is to create autonomous agents capable of making decisions, interacting with multiple APIs, and managing dynamic conversational flows, LangChain provides the flexibility needed."
Key strengths:
- Flexibility: Swap components easily
- Integrations: 700+ integrations
- Observability: LangSmith for production
- Community: Large ecosystem, many examples
- Agent support: Best-in-class agent tooling
LangChain Weaknesses
- Complexity: Learning curve can be steep
- Abstraction overhead: Sometimes too many layers
- Breaking changes: Rapid iteration means API changes
- Memory usage: Can be heavy for simple use cases
Best For
- Complex agent workflows
- Multi-step reasoning chains
- Applications needing many integrations
- Teams wanting comprehensive tooling
- Production systems with observability needs
LlamaIndex
Overview
LlamaIndex specializes in connecting LLMs to your data with optimized RAG pipelines and data ingestion.
From research: "LlamaIndex is the fastest route to high-quality, production-grade RAG on your data."
Core Architecture
LlamaIndex Ecosystem
├── llama-index-core # Core abstractions
├── llama-index-llms-* # LLM integrations
├── llama-index-embeddings-* # Embedding models
├── llama-index-readers-* # Data connectors
├── llama-index-vector-stores-* # Vector DBs
└── llama-cloud # Managed services
Core Concepts
Data Connectors (Readers)
LlamaIndex excels at data ingestion. Readers connect to data sources and produce Document objects that can be indexed. The library includes readers for files, databases, APIs, cloud storage, and more. Each reader handles the complexity of its data source—parsing PDFs, handling authentication, pagination—so you focus on what to do with the data.
The pattern is consistent across all readers: create a reader with configuration, call .load_data(), get documents. This uniformity makes it easy to support multiple data sources in a single application.
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.web import SimpleWebPageReader
from llama_index.readers.database import DatabaseReader
from llama_index.readers.notion import NotionPageReader
# Local files (PDF, DOCX, TXT, etc.)
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
required_exts=[".pdf", ".docx", ".txt"]
).load_data()
# Web pages
web_docs = SimpleWebPageReader(
html_to_text=True
).load_data([
"https://docs.example.com/guide",
"https://docs.example.com/api"
])
# Database
db_reader = DatabaseReader(
uri="postgresql://user:pass@localhost/db"
)
db_docs = db_reader.load_data(
query="SELECT title, content FROM articles WHERE published = true"
)
# Notion
notion_docs = NotionPageReader(
integration_token="secret_xxx"
).load_data(page_ids=["page-id-1", "page-id-2"])
Reader options explained:
-
SimpleDirectoryReader: The workhorse for local files.
recursive=Trueprocesses subdirectories,required_extsfilters by file type. It auto-detects parsers for PDF, DOCX, TXT, and more. -
SimpleWebPageReader: Fetches and parses web pages.
html_to_text=Truestrips HTML tags for cleaner text. For dynamic sites, consider using a headless browser reader instead. -
DatabaseReader: Executes SQL and converts rows to documents. Each row becomes a document—customize the query to get the right granularity.
-
NotionPageReader: Connects to Notion's API. Requires an integration token from your Notion workspace. Can fetch entire databases, not just individual pages.
Indexes and Query Engines
An index is how LlamaIndex organizes your documents for efficient retrieval. The most common is VectorStoreIndex, which embeds documents and enables semantic search. Once you have an index, you create a query engine to search it and generate answers.
LlamaIndex separates retrieval (finding relevant documents) from synthesis (generating answers). This separation lets you customize each stage independently—use one retrieval strategy for speed, another for accuracy.
from llama_index.core import VectorStoreIndex, Settings
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
# Configure global settings
Settings.llm = OpenAI(model="gpt-4o", temperature=0.1)
Settings.embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Create index
index = VectorStoreIndex.from_documents(
documents,
show_progress=True
)
# Simple query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic?")
print(response.response)
print(response.source_nodes) # Retrieved chunks
# Advanced query engine
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.postprocessor import SimilarityPostprocessor
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10,
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.7)
]
)
Index and query engine concepts:
-
Settings (lines 6-7): Global configuration for LLM and embedding model. All indexes and query engines use these defaults unless overridden. Set
temperature=0.1for factual queries where consistency matters. -
VectorStoreIndex (lines 10-13): Creates an index from documents.
show_progress=Truedisplays a progress bar during embedding—useful for large document sets. -
Simple query engine (lines 16-19):
.as_query_engine()creates a default query engine. It retrieves relevant chunks and synthesizes an answer.response.source_nodesshows which chunks were used—essential for debugging and citation. -
Advanced query engine (lines 22-35): For more control, build components separately.
VectorIndexRetrieverhandles search,SimilarityPostprocessorfilters low-quality matches (below 0.7 similarity). Stack multiple postprocessors for re-ranking, deduplication, or metadata filtering.
Advanced RAG Patterns
Basic RAG retrieves chunks based on semantic similarity, but this misses cases where exact keyword matches matter or where the relevant information spans multiple chunks. LlamaIndex provides advanced patterns that significantly improve retrieval quality:
- Hybrid search combines semantic and keyword matching—catch both conceptually similar and exact term matches
- Sentence window retrieval returns the sentences around matches for better context
- Auto-merging retrieval uses hierarchical chunks, merging small chunks into larger ones when multiple small chunks from the same section are retrieved
# Hybrid search (semantic + keyword)
from llama_index.core.retrievers import QueryFusionRetriever
retriever = QueryFusionRetriever(
[
index.as_retriever(similarity_top_k=5),
# BM25 retriever for keyword matching
bm25_retriever,
],
similarity_top_k=10,
num_queries=4, # Generate query variations
mode="reciprocal_rerank",
)
# Sentence window retrieval
from llama_index.core.node_parser import SentenceWindowNodeParser
from llama_index.core.postprocessor import MetadataReplacementPostProcessor
node_parser = SentenceWindowNodeParser.from_defaults(
window_size=3,
window_metadata_key="window",
original_text_metadata_key="original_text",
)
# Auto-merging retrieval (hierarchical chunks)
from llama_index.core.node_parser import HierarchicalNodeParser
from llama_index.core.postprocessor import AutoMergingRetriever
node_parser = HierarchicalNodeParser.from_defaults(
chunk_sizes=[2048, 512, 128]
)
Agents in LlamaIndex
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, FunctionTool
# Create tools from query engines
query_tool = QueryEngineTool.from_defaults(
query_engine=index.as_query_engine(),
name="knowledge_base",
description="Search the company knowledge base"
)
# Custom function tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email to a recipient."""
# Implementation
return f"Email sent to {to}"
email_tool = FunctionTool.from_defaults(fn=send_email)
# Create agent
agent = ReActAgent.from_tools(
tools=[query_tool, email_tool],
llm=OpenAI(model="gpt-4o"),
verbose=True,
max_iterations=10
)
response = agent.chat("Find info about our refund policy and email it to customer@example.com")
The LlamaIndex Ecosystem
LlamaCloud (Managed Services)
from llama_cloud import LlamaCloudIndex
# Use managed parsing and indexing
index = LlamaCloudIndex.from_documents(
documents,
name="my-index",
project_name="my-project"
)
# Query through cloud
response = index.as_query_engine().query("What are the key points?")
LlamaCloud features:
- LlamaParse: Advanced document parsing
- Managed vector storage
- Automatic chunking optimization
- API-based access
LlamaParse (Document Parsing)
from llama_parse import LlamaParse
parser = LlamaParse(
api_key="llx-xxx",
result_type="markdown",
num_workers=4,
verbose=True,
language="en"
)
# Parse complex documents (tables, images, etc.)
documents = parser.load_data("complex_report.pdf")
LlamaIndex Strengths
From research: "Since LlamaIndex was designed with RAG-heavy workflows in mind, it has a best-in-class data ingestion toolset. The framework helps engineering teams clean and structure messy data before it hits the retriever."
Key strengths:
- RAG quality: Best-in-class retrieval patterns
- Data connectors: 160+ data sources
- Document parsing: LlamaParse handles complex docs
- Learning curve: Gentler than LangChain for RAG
- Optimized patterns: Sentence window, auto-merging built-in
LlamaIndex Weaknesses
- Agent support: Less mature than LangChain
- General workflows: Less flexible for non-RAG use cases
- Ecosystem: Smaller than LangChain
- Observability: LlamaTrace less feature-rich than LangSmith
Best For
- RAG-first applications
- Document Q&A systems
- Knowledge bases
- Complex document types (tables, images)
- Quick prototyping to production RAG
LangGraph
Overview
LangGraph enables complex multi-agent systems with explicit state management and cyclic workflows.
From research: "LangGraph is a stateful framework for building multi-agent systems as graphs, created by the LangChain team. Engineers model workflows using nodes (tools, functions, LLMs, subgraphs) and edges (loops, conditional routes)."
Core Concepts
LangGraph models workflows as directed graphs where nodes are functions (often calling LLMs) and edges define the flow between them. The key innovation is state: a typed dictionary that flows through the graph, accumulating results from each node. Unlike simple chains, LangGraph supports cycles—a reviewer can send work back to a writer, creating iterative refinement loops.
Understanding state is crucial: every node receives the current state and returns updates. The graph merges these updates (using Annotated operators) and passes the new state to the next node.
State and Graph Definition
Let's build a research-write-review workflow that iterates until the reviewer approves or max iterations are reached:
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated, Literal
import operator
# Define state schema
class AgentState(TypedDict):
messages: Annotated[list, operator.add]
current_step: str
research_data: str
draft: str
feedback: str
iteration: int
# Define node functions
def researcher(state: AgentState) -> AgentState:
"""Research node - gathers information."""
# Call LLM to research
research = llm.invoke(f"Research: {state['messages'][-1]}")
return {
"messages": [f"Research complete: {research}"],
"research_data": research,
"current_step": "writer"
}
def writer(state: AgentState) -> AgentState:
"""Writer node - creates draft."""
draft = llm.invoke(
f"Write based on: {state['research_data']}\n"
f"Previous feedback: {state.get('feedback', 'None')}"
)
return {
"messages": [f"Draft created"],
"draft": draft,
"current_step": "reviewer"
}
def reviewer(state: AgentState) -> AgentState:
"""Reviewer node - evaluates draft."""
review = llm.invoke(f"Review this draft: {state['draft']}")
return {
"messages": [f"Review: {review}"],
"feedback": review,
"iteration": state.get("iteration", 0) + 1
}
# Routing function
def should_continue(state: AgentState) -> Literal["writer", "end"]:
"""Decide whether to continue iterating."""
if state["iteration"] >= 3:
return "end"
if "approved" in state["feedback"].lower():
return "end"
return "writer"
# Build graph
workflow = StateGraph(AgentState)
# Add nodes
workflow.add_node("researcher", researcher)
workflow.add_node("writer", writer)
workflow.add_node("reviewer", reviewer)
# Add edges
workflow.set_entry_point("researcher")
workflow.add_edge("researcher", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_conditional_edges(
"reviewer",
should_continue,
{
"writer": "writer",
"end": END
}
)
# Compile
app = workflow.compile()
# Run
result = app.invoke({
"messages": ["Write a blog post about LLM frameworks"],
"iteration": 0
})
Understanding the LangGraph workflow:
-
State schema (lines 7-13):
AgentStatedefines what data flows through the graph.Annotated[list, operator.add]meansmessagesaccumulates—new messages append to existing ones rather than replacing them. Other fields replace previous values. -
Node functions (lines 16-42): Each node receives the current state and returns updates. The
researchernode adds research data; thewriternode uses that data to create a draft. Notice nodes return partial updates—you only specify fields that change. -
Routing function (lines 45-51):
should_continuedecides where to go next based on state. It checks iteration count and feedback content. The return value must match an edge name defined inadd_conditional_edges. -
Graph construction (lines 54-72): Build the graph in three steps:
- Add nodes (functions)
- Set entry point
- Add edges (connections between nodes)
-
Conditional edges (lines 65-72): The reviewer can route to "writer" (for revision) or "end" (if approved). This creates the cycle that enables iterative refinement.
-
Compile and run (lines 75-81):
.compile()validates the graph and creates a runnable..invoke()starts execution from the entry point, flowing through nodes until reachingEND.
Checkpointing and Memory
from langgraph.checkpoint.sqlite import SqliteSaver
# Add persistence
memory = SqliteSaver.from_conn_string(":memory:")
app = workflow.compile(checkpointer=memory)
# Run with thread_id for persistence
config = {"configurable": {"thread_id": "user-123"}}
result1 = app.invoke({"messages": ["Start task"]}, config)
# Later...
result2 = app.invoke({"messages": ["Continue"]}, config) # Has previous state
Human-in-the-Loop
from langgraph.graph import StateGraph
from langgraph.prebuilt import ToolNode, tools_condition
# Define an interrupt point
workflow.add_node("human_approval", lambda x: x) # Pass-through
def needs_approval(state: AgentState) -> Literal["approve", "reject", "auto"]:
"""Check if human approval needed."""
if state.get("high_risk", False):
return "approve" # Will interrupt
return "auto"
workflow.add_conditional_edges(
"action_node",
needs_approval,
{
"approve": "human_approval",
"auto": "execute",
"reject": END
}
)
# Compile with interrupt
app = workflow.compile(
checkpointer=memory,
interrupt_before=["human_approval"]
)
# Run until interrupt
result = app.invoke(initial_state, config)
# Human reviews and continues
app.invoke(None, config) # Resume from interrupt
LangGraph Strengths
- State management: Explicit, type-safe state
- Cyclic workflows: Natural support for loops
- Persistence: Built-in checkpointing
- Human-in-the-loop: First-class interrupt support
- Debugging: Graph visualization, step-through execution
LangGraph Weaknesses
- Complexity: Steeper learning curve
- Overhead: More setup for simple flows
- Documentation: Less mature than LangChain core
Best For
- Multi-agent orchestration
- Complex workflows with loops
- State machines with LLM nodes
- Human-in-the-loop systems
- Long-running agent tasks
Other Frameworks
Haystack (deepset)
Production-focused NLP framework:
from haystack import Pipeline
from haystack.components.retrievers import InMemoryBM25Retriever
from haystack.components.generators import OpenAIGenerator
from haystack.components.builders import PromptBuilder
# Build pipeline
pipeline = Pipeline()
pipeline.add_component("retriever", InMemoryBM25Retriever(document_store))
pipeline.add_component("prompt", PromptBuilder(template=template))
pipeline.add_component("llm", OpenAIGenerator(model="gpt-4o"))
pipeline.connect("retriever", "prompt.documents")
pipeline.connect("prompt", "llm")
result = pipeline.run({"retriever": {"query": "What is RAG?"}})
Best for: Production search systems, traditional NLP integration
Semantic Kernel (Microsoft)
Enterprise-focused SDK with Azure integration:
import semantic_kernel as sk
from semantic_kernel.connectors.ai.open_ai import AzureChatCompletion
kernel = sk.Kernel()
kernel.add_service(AzureChatCompletion(
deployment_name="gpt-4",
endpoint="https://xxx.openai.azure.com/"
))
# Create function
@kernel.function(name="summarize")
async def summarize(text: str) -> str:
"""Summarize the given text."""
return await kernel.invoke_prompt(
f"Summarize: {text}"
)
Best for: Azure-heavy environments, .NET shops, enterprise governance
DSPy (Stanford)
Programmatic prompt optimization:
import dspy
class RAG(dspy.Module):
def __init__(self, num_passages=3):
self.retrieve = dspy.Retrieve(k=num_passages)
self.generate = dspy.ChainOfThought("context, question -> answer")
def forward(self, question):
context = self.retrieve(question).passages
return self.generate(context=context, question=question)
# Compile with optimizer
from dspy.teleprompt import BootstrapFewShot
optimizer = BootstrapFewShot(metric=your_metric)
compiled_rag = optimizer.compile(RAG(), trainset=examples)
Best for: Research, automated prompt tuning, when you have evaluation data
CrewAI
Role-based multi-agent framework:
from crewai import Agent, Task, Crew
researcher = Agent(
role="Senior Research Analyst",
goal="Research and analyze market trends",
backstory="Expert analyst with 20 years experience",
tools=[search_tool, scrape_tool]
)
writer = Agent(
role="Content Writer",
goal="Write compelling content",
backstory="Award-winning journalist"
)
research_task = Task(
description="Research AI market trends",
agent=researcher
)
write_task = Task(
description="Write report based on research",
agent=writer
)
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, write_task]
)
result = crew.kickoff()
Best for: Role-based agent systems, simpler than LangGraph
AutoGen (Microsoft)
Multi-agent conversations:
from autogen import AssistantAgent, UserProxyAgent
assistant = AssistantAgent(
name="assistant",
llm_config={"model": "gpt-4"}
)
user_proxy = UserProxyAgent(
name="user_proxy",
human_input_mode="TERMINATE",
code_execution_config={"work_dir": "coding"}
)
user_proxy.initiate_chat(
assistant,
message="Write a Python function to calculate fibonacci numbers"
)
Best for: Conversational agents, code generation with execution
Framework Comparison
Feature Matrix
| Feature | LangChain | LlamaIndex | LangGraph | Haystack |
|---|---|---|---|---|
| RAG Quality | Good | Excellent | N/A | Good |
| Agent Support | Excellent | Good | Excellent | Basic |
| Integrations | 700+ | 160+ | Via LangChain | 100+ |
| Learning Curve | Medium | Low | High | Medium |
| Production Ready | Yes | Yes | Yes | Yes |
| Observability | LangSmith | LlamaTrace | LangSmith | deepset Cloud |
| State Management | Basic | Basic | Excellent | Basic |
| Multi-Agent | Good | Basic | Excellent | Basic |
Performance Comparison
From research: "LlamaIndex generally offers better performance for retrieval tasks due to optimized indexing strategies and query engines."
| Aspect | LangChain | LlamaIndex |
|---|---|---|
| RAG latency | ~500ms | ~350ms |
| Memory usage | Higher | Lower |
| Index creation | Moderate | Optimized |
| Query optimization | Manual | Built-in |
Pricing
LangChain:
- Core library: Free (MIT)
- LangSmith: Free tier + paid plans ($39+/mo)
LlamaIndex:
- Core library: Free (MIT)
- LlamaCloud: Free tier (10k credits) + paid ($50-500+/mo)
Others:
- Most core libraries are MIT/Apache licensed
- Managed services vary by provider
Combining Frameworks
LlamaIndex RAG + LangChain Agents
The most common combination:
# LlamaIndex for high-quality RAG
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine()
# LangChain for agent orchestration
from langchain.tools import Tool
from langchain.agents import create_react_agent, AgentExecutor
knowledge_tool = Tool(
name="knowledge_base",
func=lambda q: str(query_engine.query(q)),
description="Search internal knowledge base for information"
)
agent = create_react_agent(llm, [knowledge_tool, other_tools], prompt)
executor = AgentExecutor(agent=agent, tools=[knowledge_tool, other_tools])
result = executor.invoke({"input": "user question"})
LangGraph + LlamaIndex
For complex multi-agent RAG:
from langgraph.graph import StateGraph
# LlamaIndex retrievers as nodes
def retrieval_node(state):
query = state["query"]
results = llama_index_retriever.retrieve(query)
return {"context": results, **state}
def generation_node(state):
response = llm.invoke(
f"Context: {state['context']}\nQuestion: {state['query']}"
)
return {"response": response, **state}
# Build graph
workflow = StateGraph(RAGState)
workflow.add_node("retrieve", retrieval_node)
workflow.add_node("generate", generation_node)
workflow.add_edge("retrieve", "generate")
Decision Framework
When to Choose LlamaIndex
From research: "LlamaIndex generally has a gentler learning curve. Its high-level API and focus on data connection and querying make it easier to get started."
Choose LlamaIndex when:
- Building RAG as the primary feature
- Working with complex documents (tables, images)
- Need best-in-class retrieval quality
- Want quick time-to-production
- Team is new to LLM frameworks
When to Choose LangChain
From research: "LangChain offers significantly more flexibility and control. Its modular architecture allows you to swap out different LLMs, customize prompt templates, and chain together multiple tools and agents."
Choose LangChain when:
- Building complex agent systems
- Need maximum integration flexibility
- Want comprehensive observability (LangSmith)
- Building multi-step reasoning chains
- Team has LLM development experience
When to Choose LangGraph
Choose LangGraph when:
- Building multi-agent systems
- Need explicit state management
- Require cyclic workflows (loops, retries)
- Building human-in-the-loop systems
- Long-running agent tasks
Decision Tree
What's your primary use case?
├── Document Q&A / RAG
│ ├── Simple → LlamaIndex
│ └── With agents → LlamaIndex + LangChain
│
├── Autonomous agents
│ ├── Simple tools → LangChain
│ ├── Multi-agent → LangGraph
│ └── Role-based teams → CrewAI
│
├── Enterprise / Azure
│ └── Semantic Kernel
│
├── Research / Optimization
│ └── DSPy
│
└── Production search
└── Haystack
Implementation Best Practices
Start Simple
# Don't start with this:
complex_multi_agent_langgraph_system()
# Start with this:
simple_rag_chain = retriever | prompt | llm | parser
Add Complexity Incrementally
- Start with basic RAG or chain
- Add error handling
- Add observability
- Add caching
- Add agents if needed
- Add multi-agent only when proven necessary
Monitor Everything
# LangSmith tracing
os.environ["LANGCHAIN_TRACING_V2"] = "true"
# Custom callbacks
from langchain.callbacks import BaseCallbackHandler
class MetricsCallback(BaseCallbackHandler):
def on_llm_end(self, response, **kwargs):
log_metrics({
"tokens": response.llm_output["token_usage"],
"latency": response.llm_output["latency"]
})
Use Type Safety
from pydantic import BaseModel
from typing import List, Optional
class QueryResult(BaseModel):
answer: str
sources: List[str]
confidence: float
follow_up_questions: Optional[List[str]]
# Use with structured output
model.with_structured_output(QueryResult)
Conclusion
The framework choice depends on your primary use case:
- RAG-first: Start with LlamaIndex
- Agent-first: Start with LangChain
- Multi-agent: Use LangGraph
- Enterprise/Azure: Use Semantic Kernel
- Research: Consider DSPy
Most teams benefit from combining frameworks:
- LlamaIndex for data ingestion and retrieval
- LangChain for orchestration and agents
- LangGraph for complex stateful workflows
Start simple, add complexity when needed, and always instrument for observability.
Frequently Asked Questions
Related Articles
Building Agentic AI Systems: A Complete Implementation Guide
A comprehensive guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.
Building Production-Ready RAG Systems: Lessons from the Field
A comprehensive guide to building Retrieval-Augmented Generation systems that actually work in production, based on real-world experience at Goji AI.
AI Coding Assistants 2025: Cursor vs Copilot vs Windsurf vs Claude Code
A comprehensive comparison of AI coding assistants in 2025—Cursor, GitHub Copilot, Windsurf, Claude Code, and more. Features, pricing, use cases, and how to maximize productivity with each tool.