AI Function Calling & Tool Use Guide 2026
Master LLM tool integration with OpenAI, Anthropic, and Google Gemini function calling APIs. Learn structured outputs, parallel tool calls, error handling, and production patterns.
Function calling — also known as tool use — is the mechanism that transforms large language models from text generators into action-taking systems. Instead of just producing words, a model with function calling can request that your application execute specific operations: look up data, call an API, query a database, or run calculations. This guide covers how function calling works across all major LLM providers in 2026, with practical code examples and production patterns you can deploy today.
What Is Function Calling?
At its core, function calling is a structured way for an LLM to communicate intent to your application. You define a set of tools (functions) with their names, descriptions, and parameter schemas. When the model decides it needs to call a tool, it outputs a structured JSON object specifying which function to call and with what arguments. Your application executes the function and returns the result, which the model uses to continue the conversation.
The flow always involves these steps:
- You send a message to the model along with available tool definitions
- The model responds with a tool call (or a regular text response)
- Your application executes the tool and returns the output
- The model processes the tool output and generates a final response
This is not the model executing code itself. The model only decides what to call and with what arguments. Your application is always in control of execution.
OpenAI Function Calling
OpenAI's function calling API has evolved significantly. As of 2026, the responses.create API is the recommended interface, with support for parallel tool calls, structured outputs, and the new tool search feature for managing large tool sets.
Basic Tool Definition
Tools are defined using JSON Schema for their parameters. Here's a simple example:
from openai import OpenAI
import json
client = OpenAI()
tools = [
{
"type": "function",
"name": "get_weather",
"description": "Get current weather for a location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name, e.g. 'San Francisco'"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["location"]
}
}
]
response = client.responses.create(
model="gpt-5",
tools=tools,
input="What's the weather in Tokyo?"
)
Handling Tool Calls
When the model returns tool calls, you execute them and provide the results back:
input_list = [{"role": "user", "content": "What's the weather in Tokyo?"}]
response = client.responses.create(
model="gpt-5",
tools=tools,
input=input_list
)
input_list += response.output
for item in response.output:
if item.type == "function_call":
# Execute your function logic
args = json.loads(item.arguments)
result = get_weather(args["location"], args.get("unit", "celsius"))
# Return the result to the model
input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": result
})
# Get final response
final_response = client.responses.create(
model="gpt-5",
tools=tools,
input=input_list
)
print(final_response.output_text)
Structured Outputs with Strict Mode
For reliable function calling, enable strict mode which guarantees the model's output matches your schema exactly:
tools = [
{
"type": "function",
"name": "search_products",
"description": "Search product catalog",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"},
"category": {"type": "string"},
"max_price": {"type": "number"},
"in_stock_only": {"type": "boolean"}
},
"required": ["query"]
},
"strict": True # Guarantees schema compliance
}
]
With strict mode, the model will always produce valid JSON matching your schema. This eliminates the need for defensive parsing and makes your application more reliable.
Tool Search for Large Tool Sets
GPT-5.4 and later models support tool search, which lets you defer rarely-used tools and load them only when needed. This is essential when your application has dozens or hundreds of tools:
tools = [
# Frequently used tools defined directly
{
"type": "function",
"name": "search_knowledge_base",
"description": "Search internal knowledge base",
"parameters": { ... }
}
]
# Use tool_search for deferred loading
response = client.responses.create(
model="gpt-5",
tools=tools,
tool_search={
"enabled": True,
"max_results": 5
},
input="Find our refund policy"
)
Anthropic Tool Use (Claude)
Anthropic's Claude models support tool use through the tools parameter in the messages API. The approach is similar but uses a slightly different schema format:
import anthropic
client = anthropic.Anthropic()
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[
{
"name": "get_weather",
"description": "Get current weather for a location",
"input_schema": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "City name"
}
},
"required": ["location"]
}
}
],
messages=[
{"role": "user", "content": "What's the weather in Paris?"}
]
)
# Check for tool use in response
for block in response.content:
if block.type == "tool_use":
print(f"Tool: {block.name}")
print(f"Input: {block.input}")
# Execute and return result
tool_result = get_weather(block.input["location"])
# Continue conversation with tool result
response = client.messages.create(
model="claude-sonnet-4-20250514",
max_tokens=1024,
tools=[...],
messages=[
{"role": "user", "content": "What's the weather in Paris?"},
{"role": "assistant", "content": response.content},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": block.id,
"content": str(tool_result)
}
]
}
]
)
Claude's Computer Use
Claude also supports a unique computer use capability that goes beyond traditional function calling. With computer use, Claude can interact with a computer interface — clicking, typing, and scrolling — through structured tool calls. This is available on Claude 3.5 Sonnet and later models and is particularly useful for UI automation and testing.
Google Gemini Function Calling
Gemini's function calling uses a protocol-compatible approach with OpenAI's format, making it easy to switch providers:
import google.generativeai as genai
genai.configure(api_key="YOUR_API_KEY")
# Define functions
get_weather_func = genai.protos.FunctionDeclaration(
name="get_weather",
description="Get current weather for a location",
parameters={
"type": "object",
"properties": {
"location": {"type": "string", "description": "City name"}
},
"required": ["location"]
}
)
model = genai.GenerativeModel(
model_name="gemini-2.5-pro",
tools=[{"function_declarations": [get_weather_func]}]
)
chat = model.start_chat()
response = chat.send_message("What's the weather in Berlin?")
# Check for function call
if response.candidates[0].content.parts[0].function_call:
fc = response.candidates[0].content.parts[0].function_call
print(f"Function: {fc.name}")
print(f"Args: {dict(fc.args)}")
# Execute and return
result = get_weather(dict(fc.args)["location"])
response = chat.send_message(
genai.protos.Content(
parts=[genai.protos.Part(
function_response=genai.protos.FunctionResponse(
name="get_weather",
response={"result": result}
)
)]
)
)
Provider Comparison
| Feature | OpenAI (GPT-5) | Anthropic (Claude) | Google (Gemini 2.5) |
|---|---|---|---|
| Parallel tool calls | Yes | Yes | Yes |
| Structured outputs | Strict mode | JSON mode | JSON mode |
| Tool search / deferred | Yes (GPT-5.4+) | No | No |
| Max tools per request | 128+ | 128 | 64 |
| Computer use | No | Yes | No |
| Forced tool choice | Yes | Yes | Yes |
| Streaming tool calls | Yes | Yes | Yes |
Parallel Tool Calls
One of the most powerful features in 2026 is parallel tool calling. When a user query requires multiple independent tool calls, the model can request them all at once instead of sequentially. This dramatically reduces latency for complex queries.
# Example: User asks "Compare weather in NYC and London"
# The model may return TWO tool calls simultaneously
response = client.responses.create(
model="gpt-5",
tools=tools,
input="Compare weather in NYC and London"
)
# Process all tool calls
results = {}
for item in response.output:
if item.type == "function_call":
args = json.loads(item.arguments)
results[item.call_id] = get_weather(args["location"])
input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": results[item.call_id]
})
Key point: when processing parallel tool calls, you must return all tool results before making the next API call. Don't make separate follow-up calls for each tool result.
Error Handling Patterns
Function calling in production requires robust error handling. Here are the key patterns:
1. Tool Execution Errors
When a tool execution fails, don't crash — report the error back to the model so it can recover:
def execute_tool_safely(tool_name, args):
try:
result = tool_registry[tool_name](**args)
return {"status": "success", "data": result}
except Exception as e:
return {"status": "error", "message": str(e)}
# Return error to model — it will try an alternative approach
input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": json.dumps(execute_tool_safely(item.name, args))
})
2. Schema Validation
Even with strict mode, validate tool arguments on your side:
from jsonschema import validate, ValidationError
def validate_tool_args(tool_name, args, schema):
try:
validate(instance=args, schema=schema)
return True
except ValidationError as e:
print(f"Invalid args for {tool_name}: {e.message}")
return False
3. Timeout Protection
Tool executions should always have timeouts to prevent hanging:
import asyncio
async def execute_with_timeout(tool_func, args, timeout=10):
try:
result = await asyncio.wait_for(
tool_func(**args),
timeout=timeout
)
return result
except asyncio.TimeoutError:
return {"error": f"Tool execution timed out after {timeout}s"}
Production Patterns
Tool Registry Pattern
Centralize tool definitions and handlers in a registry for clean architecture:
class ToolRegistry:
def __init__(self):
self.tools = {}
self.handlers = {}
def register(self, name, description, parameters, handler):
self.tools[name] = {
"type": "function",
"name": name,
"description": description,
"parameters": parameters
}
self.handlers[name] = handler
def get_tool_definitions(self):
return list(self.tools.values())
def execute(self, name, args):
return self.handlers[name](**args)
# Usage
registry = ToolRegistry()
registry.register(
name="search_docs",
description="Search internal documentation",
parameters={
"type": "object",
"properties": {
"query": {"type": "string"},
"limit": {"type": "integer", "default": 5}
},
"required": ["query"]
},
handler=search_documentation
)
Conversation State Management
In a multi-turn tool use conversation, maintain state carefully:
class ToolConversation:
def __init__(self, client, model, registry):
self.client = client
self.model = model
self.registry = registry
self.input_list = []
self.max_tool_rounds = 5 # Prevent infinite loops
async def run(self, user_message):
self.input_list.append({"role": "user", "content": user_message})
for round_num in range(self.max_tool_rounds):
response = self.client.responses.create(
model=self.model,
tools=self.registry.get_tool_definitions(),
input=self.input_list
)
self.input_list += response.output
# Check if model wants to call tools
tool_calls = [item for item in response.output
if item.type == "function_call"]
if not tool_calls:
return response.output_text
# Execute all tool calls
for item in tool_calls:
args = json.loads(item.arguments)
result = await self.registry.execute(item.name, args)
self.input_list.append({
"type": "function_call_output",
"call_id": item.call_id,
"output": json.dumps(result)
})
return "Maximum tool call rounds exceeded"
Structured Outputs Beyond Tool Calling
Function calling isn't just for executing actions — it's also a powerful pattern for extracting structured data from unstructured text. You can define "tools" that the model fills in with extracted information:
extraction_tools = [
{
"type": "function",
"name": "extract_customer_info",
"description": "Extract customer information from text",
"parameters": {
"type": "object",
"properties": {
"name": {"type": "string"},
"email": {"type": "string"},
"phone": {"type": "string"},
"intent": {"type": "string", "enum": ["purchase", "support", "complaint", "inquiry"]}
},
"required": ["name", "intent"]
},
"strict": True
}
]
response = client.responses.create(
model="gpt-5",
tools=extraction_tools,
tool_choice={"type": "function", "name": "extract_customer_info"},
input="Hi, I'm Sarah Johnson. My email is sarah@example.com. I need help with my recent order."
)
This pattern is extremely useful for form auto-filling, data extraction, and intent classification in production applications.
MCP vs Function Calling
The Model Context Protocol (MCP) is an emerging standard that builds on the function calling concept but adds a standardized protocol layer. Here's how they relate:
- Function calling is an API-level feature — you define tools per request
- MCP is a protocol standard — tools are hosted as MCP servers and discovered dynamically
- MCP servers expose tools that any MCP-compatible client can use
- Think of function calling as the mechanism, MCP as the interoperability standard
For most applications in 2026, direct function calling is sufficient and simpler. MCP becomes valuable when you need tool interoperability across different applications and platforms.
Best Practices
- Write clear tool descriptions — The model relies on descriptions to decide when and how to use a tool. Be specific about what the tool does and when it should be used.
- Use strict mode when available — It guarantees schema compliance and eliminates parsing errors.
- Keep tool sets focused — Don't overload the model with too many tools. If you have more than 20, use tool search or group them by context.
- Always set max rounds — Prevent infinite tool call loops by capping the number of rounds.
- Validate all inputs — Never trust model-generated arguments blindly. Validate before execution.
- Log tool calls — Track which tools are called, how often, and with what arguments. This data is invaluable for debugging and optimization.
- Return errors to the model — When a tool fails, inform the model so it can try a different approach rather than leaving the user hanging.
- Design idempotent tools — Tool execution may be retried. Design tools so that calling them twice with the same arguments produces the same result.
Common Pitfalls
| Pitfall | Solution |
|---|---|
| Vague tool descriptions leading to wrong tool selection | Write detailed descriptions with examples of when to use the tool |
| Tool calling infinite loop | Set max rounds and detect repeated identical calls |
| Missing required parameters in tool output | Use strict mode and validate server-side |
| Slow tool execution blocking the conversation | Add timeouts and async execution |
| Model hallucinating tool calls that don't exist | Use tool_choice to constrain, and validate call names |
| Tool results too large for context window | Summarize or truncate results before returning |
Conclusion
Function calling is the bridge between LLMs and the real world. In 2026, all major providers — OpenAI, Anthropic, and Google — support robust tool use with parallel calls, structured outputs, and streaming. The key to success is treating function calling as a system design problem, not just an API call. Invest in clear tool descriptions, robust error handling, and a clean architecture like the tool registry pattern. Your users will benefit from more capable, reliable AI applications.
Related Guides: AI Agent Development Guide · MCP Protocol Guide · API Rate Limits & Error Handling