A language model by itself produces words. It can't send an email, check your database, or process a payment — it can only describe wanting to do those things. Tool calling is the mechanism that bridges language and action, turning descriptions of intent into real-world outcomes.
What's Actually Happening
When we say an agent "calls a tool", we need to be precise about what that means — because it's not what it sounds like. The LLM doesn't execute code. It doesn't make API calls. What it does is output a specially structured JSON payload that signals to your framework: "I want to call this function with these parameters."
Your code sees that JSON, executes the actual function, and feeds the result back into the conversation. The LLM then reads that result and continues reasoning. Three steps, every time.
- Step 1 — Request: LLM outputs a
tool_callsarray in its response.message.contentwill beNone. Always checktool_callsfirst. - Step 2 — Execute: Your code reads the tool name and arguments from the JSON, runs the actual function, captures the result.
- Step 3 — Return: You append the result as a
role: "tool"message with atool_call_idmatching the original request. The loop continues.
Defining Tools: The Description Is for the LLM
Before the LLM can call a tool, you have to teach it the tool exists — its name, what it does, and what parameters it accepts. You pass this as a JSON schema in the tools array on every API call.
The description field isn't documentation. It's a runtime instruction that the model reads to decide when and how to use the tool. Vague descriptions produce bad tool use decisions.
# Vague — don't do this
{"name": "get_order", "description": "Gets order info"}
# Specific — do this
{
"type": "function",
"function": {
"name": "get_order_status",
"description": (
"Look up the current shipping status, estimated delivery date, "
"and tracking number for a customer order. Use when the customer "
"asks about where their order is or when it will arrive."
),
"parameters": {
"type": "object",
"properties": {
"order_id": {
"type": "string",
"description": "The order ID in format ORD-XXXX, e.g. ORD-4892"
}
},
"required": ["order_id"]
}
}
}
The Complete Execution Loop
import json, os, requests
from openai import AzureOpenAI
client = AzureOpenAI(
azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
api_key=os.getenv("AZURE_OPENAI_KEY"),
api_version="2024-02-01"
)
def get_order_status(order_id: str) -> dict:
resp = requests.get(
f"https://orders.internal/api/{order_id}",
headers={"Authorization": f"Bearer {os.getenv('ORDERS_KEY')}"}
)
return resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}
TOOLS = {"get_order_status": get_order_status}
TOOL_DEFS = [ORDER_STATUS_TOOL_DEF] # the definition dict from above
def run(user_message: str) -> str:
messages = [
{"role": "system", "content": "You are a customer support assistant."},
{"role": "user", "content": user_message}
]
while True:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=TOOL_DEFS,
tool_choice="auto"
)
choice = response.choices[0]
if choice.finish_reason == "stop":
return choice.message.content # Final answer
# LLM requested tool calls — content is None here
messages.append(choice.message) # Append the tool-call request
for tc in choice.message.tool_calls:
name = tc.function.name
args = json.loads(tc.function.arguments)
# Execute — validate parameters before this in production
result = TOOLS.get(name, lambda **k: {"error": f"Unknown tool {name}"})(**args)
# Return result — tool_call_id must match the request
messages.append({
"role": "tool",
"tool_call_id": tc.id, # MUST match tc.id exactly
"content": json.dumps(result)
})
tool_choice: Controlling When Tools Run
| Mode | Behaviour | When to use |
|---|---|---|
"auto" | LLM decides whether to call a tool or respond directly | Most agents — best balance of flexibility and efficiency |
"required" | LLM must call at least one tool before responding | When a step must always run first (e.g., eligibility check before refund) |
"none" | LLM cannot use any tools this turn | Summarisation, classification, or formatting steps |
Running Independent Tools in Parallel
import asyncio
async def execute_parallel(tool_calls: list) -> list:
async def run_one(tc):
args = json.loads(tc.function.arguments)
result = await asyncio.to_thread(TOOLS[tc.function.name], **args)
return {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}
# Results arrive in INPUT ORDER, not completion order — always
return await asyncio.gather(*[run_one(tc) for tc in tool_calls])
# 3 independent queries at 400ms each:
# Sequential: 1,200ms | Parallel: 400ms
Idempotency: Protecting Non-Reversible Actions
Some tools are safe to call twice — checking an order status a second time does no harm. Others are not — sending a confirmation email twice means the customer gets it twice. For non-reversible tools, implement request-ID deduplication:
_processed = {} # Use Redis for distributed agent deployments
def send_email_safe(to: str, subject: str, body: str, request_id: str) -> dict:
if request_id in _processed:
return _processed[request_id] # Already done — return cached result
result = _send_email_impl(to, subject, body)
_processed[request_id] = result
return result
Security and Error Handling
The LLM generates tool parameters based on user input. User input can be malicious. Always validate before executing:
import re
def validate(tool_name: str, params: dict) -> tuple:
if tool_name == "send_email":
if not re.match(r'^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$', params.get("to", "")):
return False, "Invalid email address"
if tool_name == "query_db":
forbidden = [";", "--", "DROP", "DELETE", "INSERT", "UPDATE"]
q = params.get("query", "").upper()
if any(kw in q for kw in forbidden):
return False, "Disallowed SQL keyword in query"
return True, ""
# When validation fails — return error as string content, NOT an exception
# Exceptions crash the whole loop. Error strings let the LLM respond gracefully.
messages.append({
"role": "tool",
"tool_call_id": tc.id,
"content": f"Error: {error_message}"
# The LLM reads this and says something like:
# "I was unable to complete that action due to a validation error."
})
content field. The LLM will read it and respond to the user appropriately. An uncaught exception crashes the entire conversation loop and leaves the user with a blank screen.