How many tools can I provide to a model?

Models handle 10-20 tools well. Beyond that, selection accuracy degrades. If you need more tools, use hierarchical organization or dynamic tool sets based on context.

Should I use parallel_tool_calls?

Yes, leave it enabled (it's the default). The model will only parallelize when appropriate. Disabling it forces sequential execution even when parallel would be faster.

How do I handle tools that take a long time?

For synchronous tools, set appropriate timeouts and provide progress feedback. For very long operations, consider async patterns where the tool returns immediately with a status ID, and subsequent calls check completion.

What if the model calls the wrong tool?

Improve tool descriptions to be more specific about when each tool should be used. Add guardrails that validate tool calls before execution. Consider requiring user confirmation for high-impact operations.

How do I test tool-using agents?

Mock external APIs to provide deterministic responses. Test the full flow including tool selection, execution, and result integration. Create test cases covering error scenarios and edge cases.

Should I let the model see raw API responses?

Transform raw API responses into model-friendly formats. Remove irrelevant fields, simplify nested structures, and highlight the information the model needs. This improves response quality and reduces token usage.

Function Calling & Tool Use Deep Dive

Function calling transforms LLMs from text generators into agents that take action. Instead of just describing how to book a flight, the model actually calls the booking API. Instead of explaining database queries, it executes them. This capability is foundational to every agentic application—from customer service bots to autonomous coding assistants.

This guide covers function calling comprehensively: how it works under the hood, parallel and sequential orchestration, structured output guarantees, error handling, and production patterns for building reliable tool-using agents.

How Function Calling Works

Function calling enables LLMs to generate structured API calls instead of (or alongside) natural language responses. The model doesn't execute functions directly—it generates the function name and arguments, which your application code then executes.

The Basic Flow

Define tools: You provide the LLM with a schema describing available functions—their names, descriptions, and parameters.
User query: The user asks something that might require a function call ("What's the weather in Paris?").
Model decision: The LLM decides whether to call a function and, if so, which function with what arguments.
Function execution: Your application executes the function with the provided arguments.
Result integration: You send the function result back to the LLM, which incorporates it into its response.

Why Models Are Good at This

Modern LLMs are trained on vast amounts of code, API documentation, and structured data. They understand:

Function signatures and parameter types
When APIs are appropriate for a given task
How to extract relevant parameters from natural language
How to interpret and explain function results

This isn't magic—it's pattern recognition trained on millions of examples of API usage.

Provider Implementations

OpenAI: Tools are defined with JSON Schema for parameters. The tools parameter accepts function definitions. The model returns tool_calls in its response when it wants to invoke functions.

Anthropic: Similar structure with tools parameter. Claude supports function calling with structured schema definitions and returns tool use requests.

Google (Gemini): Function declarations with schema definitions. Supports automatic function calling where the model can trigger execution directly.

All major providers have converged on similar interfaces, making multi-provider applications feasible with abstraction layers.

Parallel Function Calls

Real-world queries often require multiple pieces of information that can be fetched independently. Parallel function calling enables the model to request multiple function calls simultaneously, dramatically reducing latency.

When Parallel Calls Make Sense

Independent information needs: "What's the weather in Paris and the current EUR/USD exchange rate?" requires two API calls with no dependencies between them.

Batch operations: "Send notifications to Alice, Bob, and Charlie" can parallelize three notification calls.

Multi-source queries: "Get my calendar for tomorrow and my unread emails" pulls from independent data sources.

Enabling Parallel Calls

Most providers support parallel function calling through a parameter:

OpenAI: Set parallel_tool_calls: true (enabled by default). The model may return multiple tool calls in a single response.

Anthropic: Claude automatically generates parallel tool calls when appropriate.

The model decides when parallelization makes sense based on the query structure. You can also influence this through prompting—explicitly asking for multiple pieces of information encourages parallel calls.

Handling Parallel Results

When multiple tool calls return, you need to:

Execute in parallel: Don't wait for one call to complete before starting the next.
Collect all results: Wait for all parallel calls to complete.
Return results together: Send all results back to the model in a single message.
Handle partial failures: If one call fails, decide whether to proceed with partial results or retry.

Latency Benefits

Sequential execution: total_time = call_1_time + call_2_time + call_3_time

Parallel execution: total_time = max(call_1_time, call_2_time, call_3_time)

For three 500ms API calls, sequential takes 1.5 seconds while parallel takes 500ms—a 3x improvement.

Sequential Orchestration

Some workflows require sequential function calls where later calls depend on earlier results. The model must reason through the workflow step by step.

Dependency Chains

Example: "Book the cheapest flight to Paris and then reserve a hotel near the airport."

Search flights → Get flight details including arrival airport
Search hotels near arrival airport → Get hotel options
Book hotel (depends on knowing which airport)

The model can't parallelize these—each step needs information from the previous step.

Multi-Turn Orchestration

Complex workflows span multiple model turns:

Turn 1: User asks to plan a trip. Model calls flight search. Turn 2: Flight results provided. Model calls hotel search using flight destination. Turn 3: Hotel results provided. Model calls car rental search using hotel dates. Turn 4: All information gathered. Model presents complete plan.

Each turn requires waiting for function execution before proceeding.

ReAct Pattern

The ReAct (Reasoning + Acting) pattern structures sequential tool use:

Thought: Model reasons about what to do next
Action: Model selects and calls a function
Observation: Function result is returned
Repeat: Until task is complete

This pattern makes the model's reasoning explicit and debuggable. It naturally handles multi-step workflows where each step informs the next.

Planning and Execution

For complex workflows, separate planning from execution:

Planning phase: Model analyzes the task and generates a plan (sequence of steps, dependencies, required functions).

Execution phase: Execute the plan step by step, handling errors and adjustments as needed.

This separation enables plan validation before execution and easier recovery from mid-execution failures.

Structured Outputs

Function calling requires structured outputs—the model must generate valid JSON that matches your function schema. Unstructured or malformed outputs cause execution failures.

The Structured Output Challenge

LLMs generate text probabilistically. They might:

Produce invalid JSON (missing quotes, trailing commas)
Include extra fields not in the schema
Use wrong types (string instead of number)
Omit required fields

Without guarantees, you need extensive validation and retry logic.

Guaranteed Structured Outputs

OpenAI Structured Outputs: Set "strict": true in your function definition. OpenAI guarantees the output will exactly match your JSON schema—no validation needed.

Constrained decoding: Some frameworks constrain the model's token generation to only produce valid JSON. Libraries like Outlines and Instructor implement this approach.

Post-processing: Parse output, validate against schema, and retry with error feedback if invalid. Works with any provider but adds latency.

Schema Design Best Practices

Be specific: Narrow schemas are easier for models to satisfy than broad ones.

Use enums: Instead of free-form strings, constrain to specific allowed values.

Provide descriptions: Parameter descriptions help the model understand expected values.

Set reasonable defaults: For optional parameters, specify sensible defaults.

Validate at the boundary: Even with structured output guarantees, validate before executing sensitive operations.

Handling Complex Types

Nested objects: Models handle nested structures well. Define sub-schemas for complex parameters.

Arrays: Specify item schema for arrays. Models can generate variable-length arrays.

Union types: Use discriminated unions (a "type" field that determines the shape) for parameters that can have multiple forms.

Tool Selection and Routing

When multiple tools are available, the model must choose the right one. This decision process can be controlled and optimized.

Natural Selection

By default, models select tools based on:

Tool descriptions matching the query intent
Parameter compatibility with available information
Prior training on similar tool selection scenarios

Well-written tool descriptions are crucial—they're the primary signal the model uses for selection.

Controlling Tool Selection

tool_choice parameter:

"auto": Model decides whether to call tools (default)
"none": Model won't call tools
"required": Model must call at least one tool
Specific function: Force a particular function to be called

Use "required" when you know the query needs a tool call but want the model to select which one. Use specific function forcing when the tool choice is determined by application logic.

Tool Description Optimization

Tool descriptions directly impact selection accuracy:

Good description: "Searches for flights between two airports on a specific date. Use for flight availability and pricing queries."

Bad description: "Flight search function."

Include:

What the tool does
When to use it
What inputs it expects
What outputs it returns

Handling Tool Proliferation

As you add more tools, selection becomes harder. Strategies:

Hierarchical tools: Group related tools under category functions. "travel_search" might internally dispatch to flights, hotels, or cars.

Dynamic tool sets: Only provide tools relevant to the current context. A cooking assistant doesn't need calendar tools.

Two-stage selection: First, classify the query type, then provide tools relevant to that type.

Error Handling

Functions fail. APIs timeout, return errors, or provide unexpected results. Robust tool-using agents handle failures gracefully.

Types of Failures

Execution failures: The function throws an error (network failure, invalid parameters, service unavailable).

Invalid results: The function returns but with unexpected or unusable data (empty results, error responses).

Timeout: The function takes too long to respond.

Schema violations: The model generates invalid arguments despite validation efforts.

Error Recovery Strategies

Retry with backoff: Transient errors often resolve on retry. Implement exponential backoff for rate limits and network issues.

Alternative tools: If one tool fails, try an alternative that provides similar information.

Partial results: If some parallel calls succeed and others fail, use what succeeded and inform the user about what's missing.

Graceful degradation: If critical tools fail, provide the best response possible without them, clearly indicating limitations.

User escalation: For unrecoverable errors, explain the issue and ask the user for guidance.

Communicating Errors to Models

When a tool fails, you need to tell the model what happened. Options:

Structured error objects: Return a standardized error format with error type, message, and possible actions.

Natural language: Return a description like "The weather API is currently unavailable. Please try again later."

Retry instructions: Tell the model to try again with different parameters or an alternative approach.

The model should understand the error well enough to either recover or explain the issue to the user.

Timeout Handling

Set appropriate timeouts for each function:

Fast functions (cached lookups): 1-5 seconds Standard API calls: 10-30 seconds Long operations (complex queries, external systems): 30-120 seconds

When timeouts occur:

Cancel the operation
Return a timeout error to the model
Consider whether retry makes sense

Production Patterns

Idempotency

Tool calls might be retried due to network issues or model re-generation. Ensure functions handle duplicate calls safely:

Idempotent operations: Multiple calls produce the same result (GET requests, queries).

Non-idempotent operations: Multiple calls produce different results (POST, creating resources).

For non-idempotent operations, implement idempotency keys or check for existing resources before creating.

Rate Limiting

External APIs have rate limits. Your tool-using agent might trigger many calls quickly:

Track usage: Monitor API call rates across all tool invocations.

Implement queuing: Queue tool calls and execute at sustainable rates.

Graceful degradation: When rate limited, inform the model and adjust behavior.

User quotas: In multi-user applications, implement per-user rate limits.

Authentication and Authorization

Tools often require credentials:

Credential management: Securely store and inject credentials at execution time.

Per-user authorization: Ensure tools respect user permissions. A tool shouldn't access data the user isn't authorized to see.

Token refresh: Handle expired tokens gracefully, refreshing as needed.

Audit logging: Log tool calls with user context for security and debugging.

Caching

Many tool results are cacheable:

Query caching: Cache identical function calls with the same arguments.

Semantic caching: Cache similar queries if appropriate (weather in Paris hasn't changed in 5 minutes).

Invalidation: Implement appropriate TTLs and invalidation logic.

User-specific caching: Some results are user-specific and shouldn't be shared across users.

Observability

Tool-using agents are complex to debug. Implement comprehensive observability:

Trace tool calls: Log every tool invocation with inputs, outputs, and latency.

Decision tracking: Log why the model chose particular tools.

Error rates: Track failure rates by tool, error type, and time.

Latency distributions: Monitor P50, P95, P99 latencies for each tool.

Cost tracking: Many tools have associated costs (API fees, compute). Track usage.

Advanced Patterns

Tool Composition

Complex operations can be built from simpler tools:

Macro tools: High-level tools that internally orchestrate multiple lower-level calls.

Tool chaining: The output of one tool becomes input to another, with the model orchestrating the chain.

Conditional tools: Tools that behave differently based on context or previous results.

Human-in-the-Loop

Some operations require human approval before execution:

Confirmation patterns: Present the planned action to the user and wait for approval.

Sensitive operations: Flag high-risk operations (deletions, payments, external communications) for human review.

Confidence thresholds: If the model is uncertain about tool selection, ask for clarification instead of guessing.

Streaming with Tool Calls

When streaming responses that include tool calls:

Partial tool calls: Some providers stream tool call arguments progressively.

Interleaved streaming: Text and tool calls can be interleaved in the response.

Progress updates: For long-running tools, provide progress updates to maintain responsiveness.

In multi-agent systems, tools may be shared across agents:

Tool discovery: Agents learn about available tools dynamically.

Permission scoping: Different agents have access to different tools.

Conflict resolution: Handle cases where multiple agents try to use the same tool simultaneously.

Model Context Protocol (MCP)

The Model Context Protocol (MCP) is an open standard introduced by Anthropic in November 2024 that standardizes how AI systems integrate with external tools and data sources. It addresses the "N×M integration problem" where every AI tool was building custom integrations with every service.

The Problem MCP Solves

Without MCP, if you had N AI applications and M services, you needed N×M separate integrations. GitHub needs different integrations for Claude, ChatGPT, and every other AI tool. Google Drive needs the same. The complexity scales multiplicatively.

With MCP, GitHub builds one MCP server. Google Drive builds one MCP server. Every AI client connects to any MCP server using the standard protocol. The equation changes from N×M to N+M integrations—a massive reduction in complexity.

MCP vs. Function Calling

MCP and function calling aren't mutually exclusive—they serve different purposes:

Aspect	Function Calling	MCP
Scope	Single application	Cross-platform
Discovery	Static, defined at request time	Dynamic, runtime discovery
Standard	Vendor-specific	Open, transport-agnostic
Best for	Single-app, low-latency	Portable, multi-tool ecosystems

MCP standardizes the discovery and invocation of tools across different hosts and runtimes, while function calling remains the mechanism by which models actually invoke those tools.

Key 2025 Developments

MCP adoption accelerated rapidly in 2025:

March 2025: OpenAI officially adopted MCP across its products, including the ChatGPT desktop app
December 2025: Anthropic donated MCP to the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation co-founded by Anthropic, Block, and OpenAI

November 2025 Spec Updates

The 2025-11-25 spec release introduced major improvements:

Async Tasks: Any request can return a task handle, enabling "call-now, fetch-later" patterns
Tool Calling in Sampling: Servers can include tool definitions and specify tool choice behavior
Parallel Tool Calls: Support for concurrent tool execution
Better OAuth: Improved authentication flows

Integration with Function Calling

MCP tools are discoverable in real time, self-documented, and compatible with function-calling mechanisms that LLMs already understand. The typical integration:

Discovery: Client connects to MCP server, discovers available tools
Schema mapping: MCP tool schemas are converted to the LLM's function calling format
Invocation: LLM generates function call, client routes through MCP
Response: MCP server executes, returns result through the protocol

When to Use MCP

Use MCP when:

Building tools that should work across multiple AI platforms
Connecting to external services with existing MCP servers
Building reusable tool ecosystems

Use direct function calling when:

Building single-app integrations where latency is critical
You control both the AI application and the tools
MCP overhead isn't justified for simple use cases

Provider-Specific Patterns

OpenAI Function Calling

Key parameters:

tools: Array of function definitions with JSON Schema for parameters
tool_choice: "auto", "none", "required", or specific function
parallel_tool_calls: Enable/disable parallel calls (default: true)

Structured outputs: Set "strict": true in function definitions for guaranteed schema compliance.

Response format: Tool calls appear in tool_calls array with id, function.name, and function.arguments.

Anthropic Claude Tool Use

Key parameters:

tools: Array of tool definitions with name, description, and input_schema
tool_choice: {"type": "auto"}, {"type": "any"}, or {"type": "tool", "name": "..."}

Response format: Tool use appears as content blocks with type: "tool_use", including id, name, and input.

Best practices:

Claude benefits from detailed tool descriptions
Use tool_choice.type = "any" to force tool use when appropriate
Handle stop_reason: "tool_use" to continue conversation after tool execution

Google Gemini Function Calling

Key parameters:

tools: Array with function_declarations
tool_config: Control how tools are used (AUTO, ANY, NONE)

Automatic execution: Gemini supports automatic_function_calling where the model can trigger tool execution directly without roundtrips.

Response format: Function calls in function_call parts with name and args.

Cross-Provider Abstraction

For applications supporting multiple providers, use abstraction libraries:

LiteLLM: Unified interface across 100+ providers, handles tool format translation.

LangChain: Tool abstraction that works across providers with consistent interfaces.

Custom abstraction: Define tools once in a canonical format, translate to provider-specific formats at runtime.

Sources

OpenAI Function Calling Documentation
Anthropic Tool Use Documentation
Building Agentic AI Systems - Agent architecture patterns
Structured Outputs & Tool Use Patterns - Output validation

Table of Contents

Function Calling & Tool Use Deep Dive

How Function Calling Works

The Basic Flow

Why Models Are Good at This

Provider Implementations

Parallel Function Calls

When Parallel Calls Make Sense

Enabling Parallel Calls

Handling Parallel Results

Latency Benefits

Sequential Orchestration

Dependency Chains

Multi-Turn Orchestration

ReAct Pattern

Planning and Execution

Structured Outputs

The Structured Output Challenge

Guaranteed Structured Outputs

Schema Design Best Practices

Handling Complex Types

Tool Selection and Routing

Natural Selection

Controlling Tool Selection

Tool Description Optimization

Handling Tool Proliferation

Error Handling

Types of Failures

Error Recovery Strategies

Communicating Errors to Models

Timeout Handling

Production Patterns

Idempotency

Rate Limiting

Authentication and Authorization

Caching

Observability

Advanced Patterns

Tool Composition

Human-in-the-Loop

Streaming with Tool Calls

Multi-Agent Tool Sharing

Model Context Protocol (MCP)

The Problem MCP Solves

MCP vs. Function Calling

Key 2025 Developments

November 2025 Spec Updates

Integration with Function Calling

When to Use MCP

Provider-Specific Patterns

OpenAI Function Calling

Anthropic Claude Tool Use

Google Gemini Function Calling

Cross-Provider Abstraction

Sources

Frequently Asked Questions

Enrico Piovano, PhD

Related Articles

Building Agentic AI Systems: A Complete Implementation Guide

Structured Outputs and Tool Use: Patterns for Reliable AI Applications

Building MCP Servers: Custom Tool Integrations for AI Agents

Error Handling & Resilience for LLM Applications: Production Patterns