Function Calling & Tool Use Deep Dive: Building LLMs That Take Action
Production-focused guide to LLM function calling and tool use. Covers parallel function calls, complex orchestration patterns, structured outputs, error handling, and production best practices for agentic applications in 2025.
Table of Contents
Function Calling & Tool Use Deep Dive
Function calling transforms LLMs from text generators into agents that take action. Instead of just describing how to book a flight, the model actually calls the booking API. Instead of explaining database queries, it executes them. This capability is foundational to every agentic application—from customer service bots to autonomous coding assistants.
This guide covers function calling comprehensively: how it works under the hood, parallel and sequential orchestration, structured output guarantees, error handling, and production patterns for building reliable tool-using agents.
How Function Calling Works
Function calling enables LLMs to generate structured API calls instead of (or alongside) natural language responses. The model doesn't execute functions directly—it generates the function name and arguments, which your application code then executes.
The Basic Flow
-
Define tools: You provide the LLM with a schema describing available functions—their names, descriptions, and parameters.
-
User query: The user asks something that might require a function call ("What's the weather in Paris?").
-
Model decision: The LLM decides whether to call a function and, if so, which function with what arguments.
-
Function execution: Your application executes the function with the provided arguments.
-
Result integration: You send the function result back to the LLM, which incorporates it into its response.
Why Models Are Good at This
Modern LLMs are trained on vast amounts of code, API documentation, and structured data. They understand:
- Function signatures and parameter types
- When APIs are appropriate for a given task
- How to extract relevant parameters from natural language
- How to interpret and explain function results
This isn't magic—it's pattern recognition trained on millions of examples of API usage.
Provider Implementations
OpenAI: Tools are defined with JSON Schema for parameters. The tools parameter accepts function definitions. The model returns tool_calls in its response when it wants to invoke functions.
Anthropic: Similar structure with tools parameter. Claude supports function calling with structured schema definitions and returns tool use requests.
Google (Gemini): Function declarations with schema definitions. Supports automatic function calling where the model can trigger execution directly.
All major providers have converged on similar interfaces, making multi-provider applications feasible with abstraction layers.
Parallel Function Calls
Real-world queries often require multiple pieces of information that can be fetched independently. Parallel function calling enables the model to request multiple function calls simultaneously, dramatically reducing latency.
When Parallel Calls Make Sense
Independent information needs: "What's the weather in Paris and the current EUR/USD exchange rate?" requires two API calls with no dependencies between them.
Batch operations: "Send notifications to Alice, Bob, and Charlie" can parallelize three notification calls.
Multi-source queries: "Get my calendar for tomorrow and my unread emails" pulls from independent data sources.
Enabling Parallel Calls
Most providers support parallel function calling through a parameter:
OpenAI: Set parallel_tool_calls: true (enabled by default). The model may return multiple tool calls in a single response.
Anthropic: Claude automatically generates parallel tool calls when appropriate.
The model decides when parallelization makes sense based on the query structure. You can also influence this through prompting—explicitly asking for multiple pieces of information encourages parallel calls.
Handling Parallel Results
When multiple tool calls return, you need to:
- Execute in parallel: Don't wait for one call to complete before starting the next.
- Collect all results: Wait for all parallel calls to complete.
- Return results together: Send all results back to the model in a single message.
- Handle partial failures: If one call fails, decide whether to proceed with partial results or retry.
Latency Benefits
Sequential execution: total_time = call_1_time + call_2_time + call_3_time
Parallel execution: total_time = max(call_1_time, call_2_time, call_3_time)
For three 500ms API calls, sequential takes 1.5 seconds while parallel takes 500ms—a 3x improvement.
Sequential Orchestration
Some workflows require sequential function calls where later calls depend on earlier results. The model must reason through the workflow step by step.
Dependency Chains
Example: "Book the cheapest flight to Paris and then reserve a hotel near the airport."
- Search flights → Get flight details including arrival airport
- Search hotels near arrival airport → Get hotel options
- Book hotel (depends on knowing which airport)
The model can't parallelize these—each step needs information from the previous step.
Multi-Turn Orchestration
Complex workflows span multiple model turns:
Turn 1: User asks to plan a trip. Model calls flight search. Turn 2: Flight results provided. Model calls hotel search using flight destination. Turn 3: Hotel results provided. Model calls car rental search using hotel dates. Turn 4: All information gathered. Model presents complete plan.
Each turn requires waiting for function execution before proceeding.
ReAct Pattern
The ReAct (Reasoning + Acting) pattern structures sequential tool use:
- Thought: Model reasons about what to do next
- Action: Model selects and calls a function
- Observation: Function result is returned
- Repeat: Until task is complete
This pattern makes the model's reasoning explicit and debuggable. It naturally handles multi-step workflows where each step informs the next.
Planning and Execution
For complex workflows, separate planning from execution:
Planning phase: Model analyzes the task and generates a plan (sequence of steps, dependencies, required functions).
Execution phase: Execute the plan step by step, handling errors and adjustments as needed.
This separation enables plan validation before execution and easier recovery from mid-execution failures.
Structured Outputs
Function calling requires structured outputs—the model must generate valid JSON that matches your function schema. Unstructured or malformed outputs cause execution failures.
The Structured Output Challenge
LLMs generate text probabilistically. They might:
- Produce invalid JSON (missing quotes, trailing commas)
- Include extra fields not in the schema
- Use wrong types (string instead of number)
- Omit required fields
Without guarantees, you need extensive validation and retry logic.
Guaranteed Structured Outputs
OpenAI Structured Outputs: Set "strict": true in your function definition. OpenAI guarantees the output will exactly match your JSON schema—no validation needed.
Constrained decoding: Some frameworks constrain the model's token generation to only produce valid JSON. Libraries like Outlines and Instructor implement this approach.
Post-processing: Parse output, validate against schema, and retry with error feedback if invalid. Works with any provider but adds latency.
Schema Design Best Practices
Be specific: Narrow schemas are easier for models to satisfy than broad ones.
Use enums: Instead of free-form strings, constrain to specific allowed values.
Provide descriptions: Parameter descriptions help the model understand expected values.
Set reasonable defaults: For optional parameters, specify sensible defaults.
Validate at the boundary: Even with structured output guarantees, validate before executing sensitive operations.
Handling Complex Types
Nested objects: Models handle nested structures well. Define sub-schemas for complex parameters.
Arrays: Specify item schema for arrays. Models can generate variable-length arrays.
Union types: Use discriminated unions (a "type" field that determines the shape) for parameters that can have multiple forms.
Tool Selection and Routing
When multiple tools are available, the model must choose the right one. This decision process can be controlled and optimized.
Natural Selection
By default, models select tools based on:
- Tool descriptions matching the query intent
- Parameter compatibility with available information
- Prior training on similar tool selection scenarios
Well-written tool descriptions are crucial—they're the primary signal the model uses for selection.
Controlling Tool Selection
tool_choice parameter:
"auto": Model decides whether to call tools (default)"none": Model won't call tools"required": Model must call at least one tool- Specific function: Force a particular function to be called
Use "required" when you know the query needs a tool call but want the model to select which one. Use specific function forcing when the tool choice is determined by application logic.
Tool Description Optimization
Tool descriptions directly impact selection accuracy:
Good description: "Searches for flights between two airports on a specific date. Use for flight availability and pricing queries."
Bad description: "Flight search function."
Include:
- What the tool does
- When to use it
- What inputs it expects
- What outputs it returns
Handling Tool Proliferation
As you add more tools, selection becomes harder. Strategies:
Hierarchical tools: Group related tools under category functions. "travel_search" might internally dispatch to flights, hotels, or cars.
Dynamic tool sets: Only provide tools relevant to the current context. A cooking assistant doesn't need calendar tools.
Two-stage selection: First, classify the query type, then provide tools relevant to that type.
Error Handling
Functions fail. APIs timeout, return errors, or provide unexpected results. Robust tool-using agents handle failures gracefully.
Types of Failures
Execution failures: The function throws an error (network failure, invalid parameters, service unavailable).
Invalid results: The function returns but with unexpected or unusable data (empty results, error responses).
Timeout: The function takes too long to respond.
Schema violations: The model generates invalid arguments despite validation efforts.
Error Recovery Strategies
Retry with backoff: Transient errors often resolve on retry. Implement exponential backoff for rate limits and network issues.
Alternative tools: If one tool fails, try an alternative that provides similar information.
Partial results: If some parallel calls succeed and others fail, use what succeeded and inform the user about what's missing.
Graceful degradation: If critical tools fail, provide the best response possible without them, clearly indicating limitations.
User escalation: For unrecoverable errors, explain the issue and ask the user for guidance.
Communicating Errors to Models
When a tool fails, you need to tell the model what happened. Options:
Structured error objects: Return a standardized error format with error type, message, and possible actions.
Natural language: Return a description like "The weather API is currently unavailable. Please try again later."
Retry instructions: Tell the model to try again with different parameters or an alternative approach.
The model should understand the error well enough to either recover or explain the issue to the user.
Timeout Handling
Set appropriate timeouts for each function:
Fast functions (cached lookups): 1-5 seconds Standard API calls: 10-30 seconds Long operations (complex queries, external systems): 30-120 seconds
When timeouts occur:
- Cancel the operation
- Return a timeout error to the model
- Consider whether retry makes sense
Production Patterns
Idempotency
Tool calls might be retried due to network issues or model re-generation. Ensure functions handle duplicate calls safely:
Idempotent operations: Multiple calls produce the same result (GET requests, queries).
Non-idempotent operations: Multiple calls produce different results (POST, creating resources).
For non-idempotent operations, implement idempotency keys or check for existing resources before creating.
Rate Limiting
External APIs have rate limits. Your tool-using agent might trigger many calls quickly:
Track usage: Monitor API call rates across all tool invocations.
Implement queuing: Queue tool calls and execute at sustainable rates.
Graceful degradation: When rate limited, inform the model and adjust behavior.
User quotas: In multi-user applications, implement per-user rate limits.
Authentication and Authorization
Tools often require credentials:
Credential management: Securely store and inject credentials at execution time.
Per-user authorization: Ensure tools respect user permissions. A tool shouldn't access data the user isn't authorized to see.
Token refresh: Handle expired tokens gracefully, refreshing as needed.
Audit logging: Log tool calls with user context for security and debugging.
Caching
Many tool results are cacheable:
Query caching: Cache identical function calls with the same arguments.
Semantic caching: Cache similar queries if appropriate (weather in Paris hasn't changed in 5 minutes).
Invalidation: Implement appropriate TTLs and invalidation logic.
User-specific caching: Some results are user-specific and shouldn't be shared across users.
Observability
Tool-using agents are complex to debug. Implement comprehensive observability:
Trace tool calls: Log every tool invocation with inputs, outputs, and latency.
Decision tracking: Log why the model chose particular tools.
Error rates: Track failure rates by tool, error type, and time.
Latency distributions: Monitor P50, P95, P99 latencies for each tool.
Cost tracking: Many tools have associated costs (API fees, compute). Track usage.
Advanced Patterns
Tool Composition
Complex operations can be built from simpler tools:
Macro tools: High-level tools that internally orchestrate multiple lower-level calls.
Tool chaining: The output of one tool becomes input to another, with the model orchestrating the chain.
Conditional tools: Tools that behave differently based on context or previous results.
Human-in-the-Loop
Some operations require human approval before execution:
Confirmation patterns: Present the planned action to the user and wait for approval.
Sensitive operations: Flag high-risk operations (deletions, payments, external communications) for human review.
Confidence thresholds: If the model is uncertain about tool selection, ask for clarification instead of guessing.
Streaming with Tool Calls
When streaming responses that include tool calls:
Partial tool calls: Some providers stream tool call arguments progressively.
Interleaved streaming: Text and tool calls can be interleaved in the response.
Progress updates: For long-running tools, provide progress updates to maintain responsiveness.
Multi-Agent Tool Sharing
In multi-agent systems, tools may be shared across agents:
Tool discovery: Agents learn about available tools dynamically.
Permission scoping: Different agents have access to different tools.
Conflict resolution: Handle cases where multiple agents try to use the same tool simultaneously.
Model Context Protocol (MCP)
The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that standardizes how AI systems integrate with external tools and data sources. It addresses the "N×M integration problem" where every AI tool was building custom integrations with every service.
The Problem MCP Solves
Without MCP, if you had N AI applications and M services, you needed N×M separate integrations. GitHub needs different integrations for Claude, ChatGPT, and every other AI tool. Google Drive needs the same. The complexity scales multiplicatively.
With MCP, GitHub builds one MCP server. Google Drive builds one MCP server. Every AI client connects to any MCP server using the standard protocol. The equation changes from N×M to N+M integrations—a massive reduction in complexity.
MCP vs. Function Calling
MCP and function calling aren't mutually exclusive—they serve different purposes:
| Aspect | Function Calling | MCP |
|---|---|---|
| Scope | Single application | Cross-platform |
| Discovery | Static, defined at request time | Dynamic, runtime discovery |
| Standard | Vendor-specific | Open, transport-agnostic |
| Best for | Single-app, low-latency | Portable, multi-tool ecosystems |
MCP standardizes the discovery and invocation of tools across different hosts and runtimes, while function calling remains the mechanism by which models actually invoke those tools.
Key 2025 Developments
MCP adoption accelerated rapidly in 2025:
- March 2025: OpenAI officially adopted MCP across its products, including the ChatGPT desktop app
- December 2025: Anthropic donated MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI
November 2025 Spec Updates
The 2025-11-25 spec release introduced major improvements:
- Async Tasks: Any request can return a task handle, enabling "call-now, fetch-later" patterns
- Tool Calling in Sampling: Servers can include tool definitions and specify tool choice behavior
- Parallel Tool Calls: Support for concurrent tool execution
- Better OAuth: Improved authentication flows
Integration with Function Calling
MCP tools are discoverable in real time, self-documented, and compatible with function-calling mechanisms that LLMs already understand. The typical integration:
- Discovery: Client connects to MCP server, discovers available tools
- Schema mapping: MCP tool schemas are converted to the LLM's function calling format
- Invocation: LLM generates function call, client routes through MCP
- Response: MCP server executes, returns result through the protocol
When to Use MCP
Use MCP when:
- Building tools that should work across multiple AI platforms
- Connecting to external services with existing MCP servers
- Building reusable tool ecosystems
Use direct function calling when:
- Building single-app integrations where latency is critical
- You control both the AI application and the tools
- MCP overhead isn't justified for simple use cases
Provider-Specific Patterns
OpenAI Function Calling
Key parameters:
tools: Array of function definitions with JSON Schema for parameterstool_choice:"auto","none","required", or specific functionparallel_tool_calls: Enable/disable parallel calls (default: true)
Structured outputs: Set "strict": true in function definitions for guaranteed schema compliance.
Response format: Tool calls appear in tool_calls array with id, function.name, and function.arguments.
Anthropic Claude Tool Use
Key parameters:
tools: Array of tool definitions withname,description, andinput_schematool_choice:{"type": "auto"},{"type": "any"}, or{"type": "tool", "name": "..."}
Response format: Tool use appears as content blocks with type: "tool_use", including id, name, and input.
Best practices:
- Claude benefits from detailed tool descriptions
- Use
tool_choice.type = "any"to force tool use when appropriate - Handle
stop_reason: "tool_use"to continue conversation after tool execution
Google Gemini Function Calling
Key parameters:
tools: Array withfunction_declarationstool_config: Control how tools are used (AUTO,ANY,NONE)
Automatic execution: Gemini supports automatic_function_calling where the model can trigger tool execution directly without roundtrips.
Response format: Function calls in function_call parts with name and args.
Cross-Provider Abstraction
For applications supporting multiple providers, use abstraction libraries:
LiteLLM: Unified interface across 100+ providers, handles tool format translation.
LangChain: Tool abstraction that works across providers with consistent interfaces.
Custom abstraction: Define tools once in a canonical format, translate to provider-specific formats at runtime.
Sources
- OpenAI Function Calling Documentation
- Anthropic Tool Use Documentation
- Building Agentic AI Systems - Agent architecture patterns
- Structured Outputs & Tool Use Patterns - Output validation
Frequently Asked Questions
Related Articles
Building Agentic AI Systems: A Complete Implementation Guide
Hands-on guide to building AI agents—tool use, ReAct pattern, planning, memory, context management, MCP integration, and multi-agent orchestration. With full prompt examples and production patterns.
Structured Outputs and Tool Use: Patterns for Reliable AI Applications
Master structured output generation and tool use patterns—JSON mode, schema enforcement, Instructor library, function calling best practices, error handling, and production patterns for reliable AI applications.
Building MCP Servers: Custom Tool Integrations for AI Agents
Field guide to building Model Context Protocol (MCP) servers—from basic tool exposure to production-grade integrations with authentication, streaming, and error handling.
Error Handling & Resilience for LLM Applications: Production Patterns
Hands-on guide to building resilient LLM applications. Covers retry strategies with exponential backoff, circuit breakers, fallback patterns, rate limit handling, timeout management, and multi-provider failover for production systems.