Skip to main content
Back to Blog

Function Calling & Tool Use Deep Dive: Building LLMs That Take Action

Production-focused guide to LLM function calling and tool use. Covers parallel function calls, complex orchestration patterns, structured outputs, error handling, and production best practices for agentic applications in 2025.

13 min read
Share:

Function Calling & Tool Use Deep Dive

Function calling transforms LLMs from text generators into agents that take action. Instead of just describing how to book a flight, the model actually calls the booking API. Instead of explaining database queries, it executes them. This capability is foundational to every agentic application—from customer service bots to autonomous coding assistants.

This guide covers function calling comprehensively: how it works under the hood, parallel and sequential orchestration, structured output guarantees, error handling, and production patterns for building reliable tool-using agents.


How Function Calling Works

Function calling enables LLMs to generate structured API calls instead of (or alongside) natural language responses. The model doesn't execute functions directly—it generates the function name and arguments, which your application code then executes.

The Basic Flow

  1. Define tools: You provide the LLM with a schema describing available functions—their names, descriptions, and parameters.

  2. User query: The user asks something that might require a function call ("What's the weather in Paris?").

  3. Model decision: The LLM decides whether to call a function and, if so, which function with what arguments.

  4. Function execution: Your application executes the function with the provided arguments.

  5. Result integration: You send the function result back to the LLM, which incorporates it into its response.

Why Models Are Good at This

Modern LLMs are trained on vast amounts of code, API documentation, and structured data. They understand:

  • Function signatures and parameter types
  • When APIs are appropriate for a given task
  • How to extract relevant parameters from natural language
  • How to interpret and explain function results

This isn't magic—it's pattern recognition trained on millions of examples of API usage.

Provider Implementations

OpenAI: Tools are defined with JSON Schema for parameters. The tools parameter accepts function definitions. The model returns tool_calls in its response when it wants to invoke functions.

Anthropic: Similar structure with tools parameter. Claude supports function calling with structured schema definitions and returns tool use requests.

Google (Gemini): Function declarations with schema definitions. Supports automatic function calling where the model can trigger execution directly.

All major providers have converged on similar interfaces, making multi-provider applications feasible with abstraction layers.


Parallel Function Calls

Real-world queries often require multiple pieces of information that can be fetched independently. Parallel function calling enables the model to request multiple function calls simultaneously, dramatically reducing latency.

When Parallel Calls Make Sense

Independent information needs: "What's the weather in Paris and the current EUR/USD exchange rate?" requires two API calls with no dependencies between them.

Batch operations: "Send notifications to Alice, Bob, and Charlie" can parallelize three notification calls.

Multi-source queries: "Get my calendar for tomorrow and my unread emails" pulls from independent data sources.

Enabling Parallel Calls

Most providers support parallel function calling through a parameter:

OpenAI: Set parallel_tool_calls: true (enabled by default). The model may return multiple tool calls in a single response.

Anthropic: Claude automatically generates parallel tool calls when appropriate.

The model decides when parallelization makes sense based on the query structure. You can also influence this through prompting—explicitly asking for multiple pieces of information encourages parallel calls.

Handling Parallel Results

When multiple tool calls return, you need to:

  1. Execute in parallel: Don't wait for one call to complete before starting the next.
  2. Collect all results: Wait for all parallel calls to complete.
  3. Return results together: Send all results back to the model in a single message.
  4. Handle partial failures: If one call fails, decide whether to proceed with partial results or retry.

Latency Benefits

Sequential execution: total_time = call_1_time + call_2_time + call_3_time

Parallel execution: total_time = max(call_1_time, call_2_time, call_3_time)

For three 500ms API calls, sequential takes 1.5 seconds while parallel takes 500ms—a 3x improvement.


Sequential Orchestration

Some workflows require sequential function calls where later calls depend on earlier results. The model must reason through the workflow step by step.

Dependency Chains

Example: "Book the cheapest flight to Paris and then reserve a hotel near the airport."

  1. Search flights → Get flight details including arrival airport
  2. Search hotels near arrival airport → Get hotel options
  3. Book hotel (depends on knowing which airport)

The model can't parallelize these—each step needs information from the previous step.

Multi-Turn Orchestration

Complex workflows span multiple model turns:

Turn 1: User asks to plan a trip. Model calls flight search. Turn 2: Flight results provided. Model calls hotel search using flight destination. Turn 3: Hotel results provided. Model calls car rental search using hotel dates. Turn 4: All information gathered. Model presents complete plan.

Each turn requires waiting for function execution before proceeding.

ReAct Pattern

The ReAct (Reasoning + Acting) pattern structures sequential tool use:

  1. Thought: Model reasons about what to do next
  2. Action: Model selects and calls a function
  3. Observation: Function result is returned
  4. Repeat: Until task is complete

This pattern makes the model's reasoning explicit and debuggable. It naturally handles multi-step workflows where each step informs the next.

Planning and Execution

For complex workflows, separate planning from execution:

Planning phase: Model analyzes the task and generates a plan (sequence of steps, dependencies, required functions).

Execution phase: Execute the plan step by step, handling errors and adjustments as needed.

This separation enables plan validation before execution and easier recovery from mid-execution failures.


Structured Outputs

Function calling requires structured outputs—the model must generate valid JSON that matches your function schema. Unstructured or malformed outputs cause execution failures.

The Structured Output Challenge

LLMs generate text probabilistically. They might:

  • Produce invalid JSON (missing quotes, trailing commas)
  • Include extra fields not in the schema
  • Use wrong types (string instead of number)
  • Omit required fields

Without guarantees, you need extensive validation and retry logic.

Guaranteed Structured Outputs

OpenAI Structured Outputs: Set "strict": true in your function definition. OpenAI guarantees the output will exactly match your JSON schema—no validation needed.

Constrained decoding: Some frameworks constrain the model's token generation to only produce valid JSON. Libraries like Outlines and Instructor implement this approach.

Post-processing: Parse output, validate against schema, and retry with error feedback if invalid. Works with any provider but adds latency.

Schema Design Best Practices

Be specific: Narrow schemas are easier for models to satisfy than broad ones.

Use enums: Instead of free-form strings, constrain to specific allowed values.

Provide descriptions: Parameter descriptions help the model understand expected values.

Set reasonable defaults: For optional parameters, specify sensible defaults.

Validate at the boundary: Even with structured output guarantees, validate before executing sensitive operations.

Handling Complex Types

Nested objects: Models handle nested structures well. Define sub-schemas for complex parameters.

Arrays: Specify item schema for arrays. Models can generate variable-length arrays.

Union types: Use discriminated unions (a "type" field that determines the shape) for parameters that can have multiple forms.


Tool Selection and Routing

When multiple tools are available, the model must choose the right one. This decision process can be controlled and optimized.

Natural Selection

By default, models select tools based on:

  • Tool descriptions matching the query intent
  • Parameter compatibility with available information
  • Prior training on similar tool selection scenarios

Well-written tool descriptions are crucial—they're the primary signal the model uses for selection.

Controlling Tool Selection

tool_choice parameter:

  • "auto": Model decides whether to call tools (default)
  • "none": Model won't call tools
  • "required": Model must call at least one tool
  • Specific function: Force a particular function to be called

Use "required" when you know the query needs a tool call but want the model to select which one. Use specific function forcing when the tool choice is determined by application logic.

Tool Description Optimization

Tool descriptions directly impact selection accuracy:

Good description: "Searches for flights between two airports on a specific date. Use for flight availability and pricing queries."

Bad description: "Flight search function."

Include:

  • What the tool does
  • When to use it
  • What inputs it expects
  • What outputs it returns

Handling Tool Proliferation

As you add more tools, selection becomes harder. Strategies:

Hierarchical tools: Group related tools under category functions. "travel_search" might internally dispatch to flights, hotels, or cars.

Dynamic tool sets: Only provide tools relevant to the current context. A cooking assistant doesn't need calendar tools.

Two-stage selection: First, classify the query type, then provide tools relevant to that type.


Error Handling

Functions fail. APIs timeout, return errors, or provide unexpected results. Robust tool-using agents handle failures gracefully.

Types of Failures

Execution failures: The function throws an error (network failure, invalid parameters, service unavailable).

Invalid results: The function returns but with unexpected or unusable data (empty results, error responses).

Timeout: The function takes too long to respond.

Schema violations: The model generates invalid arguments despite validation efforts.

Error Recovery Strategies

Retry with backoff: Transient errors often resolve on retry. Implement exponential backoff for rate limits and network issues.

Alternative tools: If one tool fails, try an alternative that provides similar information.

Partial results: If some parallel calls succeed and others fail, use what succeeded and inform the user about what's missing.

Graceful degradation: If critical tools fail, provide the best response possible without them, clearly indicating limitations.

User escalation: For unrecoverable errors, explain the issue and ask the user for guidance.

Communicating Errors to Models

When a tool fails, you need to tell the model what happened. Options:

Structured error objects: Return a standardized error format with error type, message, and possible actions.

Natural language: Return a description like "The weather API is currently unavailable. Please try again later."

Retry instructions: Tell the model to try again with different parameters or an alternative approach.

The model should understand the error well enough to either recover or explain the issue to the user.

Timeout Handling

Set appropriate timeouts for each function:

Fast functions (cached lookups): 1-5 seconds Standard API calls: 10-30 seconds Long operations (complex queries, external systems): 30-120 seconds

When timeouts occur:

  1. Cancel the operation
  2. Return a timeout error to the model
  3. Consider whether retry makes sense

Production Patterns

Idempotency

Tool calls might be retried due to network issues or model re-generation. Ensure functions handle duplicate calls safely:

Idempotent operations: Multiple calls produce the same result (GET requests, queries).

Non-idempotent operations: Multiple calls produce different results (POST, creating resources).

For non-idempotent operations, implement idempotency keys or check for existing resources before creating.

Rate Limiting

External APIs have rate limits. Your tool-using agent might trigger many calls quickly:

Track usage: Monitor API call rates across all tool invocations.

Implement queuing: Queue tool calls and execute at sustainable rates.

Graceful degradation: When rate limited, inform the model and adjust behavior.

User quotas: In multi-user applications, implement per-user rate limits.

Authentication and Authorization

Tools often require credentials:

Credential management: Securely store and inject credentials at execution time.

Per-user authorization: Ensure tools respect user permissions. A tool shouldn't access data the user isn't authorized to see.

Token refresh: Handle expired tokens gracefully, refreshing as needed.

Audit logging: Log tool calls with user context for security and debugging.

Caching

Many tool results are cacheable:

Query caching: Cache identical function calls with the same arguments.

Semantic caching: Cache similar queries if appropriate (weather in Paris hasn't changed in 5 minutes).

Invalidation: Implement appropriate TTLs and invalidation logic.

User-specific caching: Some results are user-specific and shouldn't be shared across users.

Observability

Tool-using agents are complex to debug. Implement comprehensive observability:

Trace tool calls: Log every tool invocation with inputs, outputs, and latency.

Decision tracking: Log why the model chose particular tools.

Error rates: Track failure rates by tool, error type, and time.

Latency distributions: Monitor P50, P95, P99 latencies for each tool.

Cost tracking: Many tools have associated costs (API fees, compute). Track usage.


Advanced Patterns

Tool Composition

Complex operations can be built from simpler tools:

Macro tools: High-level tools that internally orchestrate multiple lower-level calls.

Tool chaining: The output of one tool becomes input to another, with the model orchestrating the chain.

Conditional tools: Tools that behave differently based on context or previous results.

Human-in-the-Loop

Some operations require human approval before execution:

Confirmation patterns: Present the planned action to the user and wait for approval.

Sensitive operations: Flag high-risk operations (deletions, payments, external communications) for human review.

Confidence thresholds: If the model is uncertain about tool selection, ask for clarification instead of guessing.

Streaming with Tool Calls

When streaming responses that include tool calls:

Partial tool calls: Some providers stream tool call arguments progressively.

Interleaved streaming: Text and tool calls can be interleaved in the response.

Progress updates: For long-running tools, provide progress updates to maintain responsiveness.

Multi-Agent Tool Sharing

In multi-agent systems, tools may be shared across agents:

Tool discovery: Agents learn about available tools dynamically.

Permission scoping: Different agents have access to different tools.

Conflict resolution: Handle cases where multiple agents try to use the same tool simultaneously.


Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that standardizes how AI systems integrate with external tools and data sources. It addresses the "N×M integration problem" where every AI tool was building custom integrations with every service.

The Problem MCP Solves

Without MCP, if you had N AI applications and M services, you needed N×M separate integrations. GitHub needs different integrations for Claude, ChatGPT, and every other AI tool. Google Drive needs the same. The complexity scales multiplicatively.

With MCP, GitHub builds one MCP server. Google Drive builds one MCP server. Every AI client connects to any MCP server using the standard protocol. The equation changes from N×M to N+M integrations—a massive reduction in complexity.

MCP vs. Function Calling

MCP and function calling aren't mutually exclusive—they serve different purposes:

AspectFunction CallingMCP
ScopeSingle applicationCross-platform
DiscoveryStatic, defined at request timeDynamic, runtime discovery
StandardVendor-specificOpen, transport-agnostic
Best forSingle-app, low-latencyPortable, multi-tool ecosystems

MCP standardizes the discovery and invocation of tools across different hosts and runtimes, while function calling remains the mechanism by which models actually invoke those tools.

Key 2025 Developments

MCP adoption accelerated rapidly in 2025:

  • March 2025: OpenAI officially adopted MCP across its products, including the ChatGPT desktop app
  • December 2025: Anthropic donated MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI

November 2025 Spec Updates

The 2025-11-25 spec release introduced major improvements:

  • Async Tasks: Any request can return a task handle, enabling "call-now, fetch-later" patterns
  • Tool Calling in Sampling: Servers can include tool definitions and specify tool choice behavior
  • Parallel Tool Calls: Support for concurrent tool execution
  • Better OAuth: Improved authentication flows

Integration with Function Calling

MCP tools are discoverable in real time, self-documented, and compatible with function-calling mechanisms that LLMs already understand. The typical integration:

  1. Discovery: Client connects to MCP server, discovers available tools
  2. Schema mapping: MCP tool schemas are converted to the LLM's function calling format
  3. Invocation: LLM generates function call, client routes through MCP
  4. Response: MCP server executes, returns result through the protocol

When to Use MCP

Use MCP when:

  • Building tools that should work across multiple AI platforms
  • Connecting to external services with existing MCP servers
  • Building reusable tool ecosystems

Use direct function calling when:

  • Building single-app integrations where latency is critical
  • You control both the AI application and the tools
  • MCP overhead isn't justified for simple use cases

Provider-Specific Patterns

OpenAI Function Calling

Key parameters:

  • tools: Array of function definitions with JSON Schema for parameters
  • tool_choice: "auto", "none", "required", or specific function
  • parallel_tool_calls: Enable/disable parallel calls (default: true)

Structured outputs: Set "strict": true in function definitions for guaranteed schema compliance.

Response format: Tool calls appear in tool_calls array with id, function.name, and function.arguments.

Anthropic Claude Tool Use

Key parameters:

  • tools: Array of tool definitions with name, description, and input_schema
  • tool_choice: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}

Response format: Tool use appears as content blocks with type: "tool_use", including id, name, and input.

Best practices:

  • Claude benefits from detailed tool descriptions
  • Use tool_choice.type = "any" to force tool use when appropriate
  • Handle stop_reason: "tool_use" to continue conversation after tool execution

Google Gemini Function Calling

Key parameters:

  • tools: Array with function_declarations
  • tool_config: Control how tools are used (AUTO, ANY, NONE)

Automatic execution: Gemini supports automatic_function_calling where the model can trigger tool execution directly without roundtrips.

Response format: Function calls in function_call parts with name and args.

Cross-Provider Abstraction

For applications supporting multiple providers, use abstraction libraries:

LiteLLM: Unified interface across 100+ providers, handles tool format translation.

LangChain: Tool abstraction that works across providers with consistent interfaces.

Custom abstraction: Define tools once in a canonical format, translate to provider-specific formats at runtime.



Sources


Frequently Asked Questions

Enrico Piovano, PhD

Co-founder & CTO at Goji AI. Former Applied Scientist at Amazon (Alexa & AGI), focused on Agentic AI and LLMs. PhD in Electrical Engineering from Imperial College London. Gold Medalist at the National Mathematical Olympiad.

Related Articles