Tool Calling

Tool Calling: How AI Agents Act in the Real World

10 min read  ·  MindTechLabs

A language model by itself produces words. It can't send an email, check your database, or process a payment — it can only describe wanting to do those things. Tool calling is the mechanism that bridges language and action, turning descriptions of intent into real-world outcomes.

What's Actually Happening

When we say an agent "calls a tool", we need to be precise about what that means — because it's not what it sounds like. The LLM doesn't execute code. It doesn't make API calls. What it does is output a specially structured JSON payload that signals to your framework: "I want to call this function with these parameters."

Your code sees that JSON, executes the actual function, and feeds the result back into the conversation. The LLM then reads that result and continues reasoning. Three steps, every time.

Manager and assistant: A manager tells their assistant "please check the status of order 4892 in the system." The assistant looks it up, comes back with the answer, and the manager uses that answer in their response to the client. The manager never touches the system directly. That's exactly how tool calling works — the LLM is the manager, your code is the assistant.
  • Step 1 — Request: LLM outputs a tool_calls array in its response. message.content will be None. Always check tool_calls first.
  • Step 2 — Execute: Your code reads the tool name and arguments from the JSON, runs the actual function, captures the result.
  • Step 3 — Return: You append the result as a role: "tool" message with a tool_call_id matching the original request. The loop continues.

Defining Tools: The Description Is for the LLM

Before the LLM can call a tool, you have to teach it the tool exists — its name, what it does, and what parameters it accepts. You pass this as a JSON schema in the tools array on every API call.

The description field isn't documentation. It's a runtime instruction that the model reads to decide when and how to use the tool. Vague descriptions produce bad tool use decisions.

# Vague — don't do this
{"name": "get_order", "description": "Gets order info"}

# Specific — do this
{
  "type": "function",
  "function": {
    "name": "get_order_status",
    "description": (
        "Look up the current shipping status, estimated delivery date, "
        "and tracking number for a customer order. Use when the customer "
        "asks about where their order is or when it will arrive."
    ),
    "parameters": {
      "type": "object",
      "properties": {
        "order_id": {
          "type": "string",
          "description": "The order ID in format ORD-XXXX, e.g. ORD-4892"
        }
      },
      "required": ["order_id"]
    }
  }
}

The Complete Execution Loop

import json, os, requests
from openai import AzureOpenAI

client = AzureOpenAI(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_KEY"),
    api_version="2024-02-01"
)

def get_order_status(order_id: str) -> dict:
    resp = requests.get(
        f"https://orders.internal/api/{order_id}",
        headers={"Authorization": f"Bearer {os.getenv('ORDERS_KEY')}"}
    )
    return resp.json() if resp.ok else {"error": f"HTTP {resp.status_code}"}

TOOLS = {"get_order_status": get_order_status}
TOOL_DEFS = [ORDER_STATUS_TOOL_DEF]  # the definition dict from above

def run(user_message: str) -> str:
    messages = [
        {"role": "system", "content": "You are a customer support assistant."},
        {"role": "user",   "content": user_message}
    ]

    while True:
        response = client.chat.completions.create(
            model="gpt-4o",
            messages=messages,
            tools=TOOL_DEFS,
            tool_choice="auto"
        )
        choice = response.choices[0]

        if choice.finish_reason == "stop":
            return choice.message.content  # Final answer

        # LLM requested tool calls — content is None here
        messages.append(choice.message)  # Append the tool-call request

        for tc in choice.message.tool_calls:
            name = tc.function.name
            args = json.loads(tc.function.arguments)

            # Execute — validate parameters before this in production
            result = TOOLS.get(name, lambda **k: {"error": f"Unknown tool {name}"})(**args)

            # Return result — tool_call_id must match the request
            messages.append({
                "role":         "tool",
                "tool_call_id": tc.id,       # MUST match tc.id exactly
                "content":      json.dumps(result)
            })

tool_choice: Controlling When Tools Run

ModeBehaviourWhen to use
"auto"LLM decides whether to call a tool or respond directlyMost agents — best balance of flexibility and efficiency
"required"LLM must call at least one tool before respondingWhen a step must always run first (e.g., eligibility check before refund)
"none"LLM cannot use any tools this turnSummarisation, classification, or formatting steps

Running Independent Tools in Parallel

import asyncio

async def execute_parallel(tool_calls: list) -> list:
    async def run_one(tc):
        args   = json.loads(tc.function.arguments)
        result = await asyncio.to_thread(TOOLS[tc.function.name], **args)
        return {"role": "tool", "tool_call_id": tc.id, "content": json.dumps(result)}

    # Results arrive in INPUT ORDER, not completion order — always
    return await asyncio.gather(*[run_one(tc) for tc in tool_calls])
    # 3 independent queries at 400ms each:
    #   Sequential: 1,200ms | Parallel: 400ms

Idempotency: Protecting Non-Reversible Actions

Some tools are safe to call twice — checking an order status a second time does no harm. Others are not — sending a confirmation email twice means the customer gets it twice. For non-reversible tools, implement request-ID deduplication:

_processed = {}  # Use Redis for distributed agent deployments

def send_email_safe(to: str, subject: str, body: str, request_id: str) -> dict:
    if request_id in _processed:
        return _processed[request_id]  # Already done — return cached result
    result = _send_email_impl(to, subject, body)
    _processed[request_id] = result
    return result

Security and Error Handling

The LLM generates tool parameters based on user input. User input can be malicious. Always validate before executing:

import re

def validate(tool_name: str, params: dict) -> tuple:
    if tool_name == "send_email":
        if not re.match(r'^[\w.+-]+@[\w.-]+\.[a-zA-Z]{2,}$', params.get("to", "")):
            return False, "Invalid email address"
    if tool_name == "query_db":
        forbidden = [";", "--", "DROP", "DELETE", "INSERT", "UPDATE"]
        q = params.get("query", "").upper()
        if any(kw in q for kw in forbidden):
            return False, "Disallowed SQL keyword in query"
    return True, ""

# When validation fails — return error as string content, NOT an exception
# Exceptions crash the whole loop. Error strings let the LLM respond gracefully.
messages.append({
    "role": "tool",
    "tool_call_id": tc.id,
    "content": f"Error: {error_message}"
    # The LLM reads this and says something like:
    # "I was unable to complete that action due to a validation error."
})
Never raise unhandled exceptions in tool execution. If a tool fails, return the error as a string in the content field. The LLM will read it and respond to the user appropriately. An uncaught exception crashes the entire conversation loop and leaves the user with a blank screen.

Test What You've Learned

Tool calling is 35% of the exam combined with multi-agent. Practice the scenarios now.

Practice Tool Calling Questions →
← Back to All Blogs