What Is an AI Agent? Complete Beginner's Guide to Azure AI Agents

There's a lot of hype around AI agents right now, and with it, a lot of confusion. People use "agent" to mean everything from a simple chatbot to a full autonomous system. Let's cut through that and build a clear, practical understanding — because once you get this, the rest of Azure AI starts making a lot more sense.

Start With What You Already Know

Most developers have called an LLM API at some point. You send a message, you get a response. It's like a very smart autocomplete — you ask it something, it produces text. Powerful, but limited. That interaction ends the moment you get your response. The model doesn't remember you called it. It doesn't take any action in the world. It just talks.

An AI agent is something bigger. It's a software system that uses an LLM as its reasoning engine, but wraps around it a set of capabilities the LLM alone doesn't have: the ability to call real APIs, the ability to remember context, and the ability to work through multi-step problems without constant human guidance.

The way we think about it: A calculator is a tool — give it an input, get an output. An accountant is an agent — give them a goal ("close the books for Q2") and they figure out every step needed, make calls, gather data, and deliver a result. An LLM API call is the calculator. An AI agent is the accountant.

The Three Things That Make an Agent an Agent

Strip away the marketing and most modern AI agents share three core capabilities. Miss any one of these and you have something less than a true agent.

Reasoning: The agent can break a goal into steps, evaluate what it knows, decide what it needs to find out, and adapt its plan when things don't go as expected. This isn't hardcoded logic — it emerges from the LLM's ability to plan and reflect.
Tool use: The agent can reach beyond its own knowledge. It can call a weather API, query your company's database, send an email, or search the web. Without tools, the agent can only produce words — with tools, it can produce outcomes.
Memory: The agent remembers. Not just within a single conversation, but potentially across sessions. It can recall that a user prefers metric units, that a previous attempt at a task failed, or what the user said three messages ago.

Key insight: The LLM is the brain, but the agent is the whole system. The LLM can't execute code, call an API, or remember anything — the agent framework around it provides all of that. This separation matters for understanding how things can go wrong and how to fix them.

Stateful vs Stateless: The Practical Difference

Here's a concept that trips people up early on. A plain API call to an LLM is stateless — each call is completely independent. The service has no memory of any previous call you've made. That's fine for one-off questions, but it completely breaks down when you need a conversation that develops over time.

Agents are designed to be stateful. They carry context forward. Consider this exchange:

User:  I need to book travel to Melbourne next Wednesday.
Agent: Got it. Would you prefer morning or afternoon departure?
User:  Morning, please.
Agent: I found a 7:40am flight. Shall I book it?

That last message from the agent is only possible because it remembered what "morning" referred to, what city was involved, and what day was requested. A stateless system would treat "Morning, please" as a complete nonsense message with no context. The stateful agent holds all of that naturally.

Tokens: The Currency You're Spending

Before you build anything serious with agents, you need to understand tokens — because they directly affect your costs, your model's limitations, and how you architect memory.

A token is roughly four characters of English text. "Hello world" is about three tokens. Every LLM call is priced per token — both what you send in and what you get back. Each model also has a maximum number of tokens it can process in a single call (its context window). Exceed that and the call fails, or the model starts dropping the oldest context.

Why does this matter for agents specifically? Because agents build up context over time. Every tool result, every prior message, every memory lookup gets added to the next call. A long conversation with a few tool calls can easily use tens of thousands of tokens. Plan for this upfront.

System Instructions: How You Shape Agent Behaviour

One of the most important tools you have is the system instruction — a persistent message that frames everything the agent does. Unlike a user message (which changes each turn), the system instruction stays constant throughout the conversation.

Think of it as your agent's job description and code of conduct combined:

You are a customer support agent for Contoso Electronics.
Only answer questions about Contoso products and services.
Before discussing any refund, confirm the order number first.
Never discuss competitor products or reveal internal pricing.

With that system instruction set, the agent refuses to answer questions about the weather, won't discuss pricing unless you unlock it, and always starts refund conversations the right way. System instructions are the first line of defence in keeping your agent on-topic and secure.

How Tool Calling Actually Works

This is the bit that confuses most people: the LLM doesn't call your API. It asks for it to be called.

When the LLM decides it needs to call a tool, it generates a special structured JSON message in its response — not prose text, but a machine-readable request. Your agent framework intercepts that JSON, executes the actual function (your real code, your real API), and feeds the result back into the conversation. The LLM then reasons over that result to produce the next step or final answer.

# What the LLM actually outputs (simplified):
{
  "tool_call": {
    "name": "get_order_status",
    "parameters": { "order_id": "ORD-4892" }
  }
}

# Your framework sees this, calls your function:
result = get_order_status(order_id="ORD-4892")
# Returns: {"status": "shipped", "eta": "2026-06-09"}

# That result goes back to the LLM, which then generates:
# "Your order ORD-4892 has shipped and is expected to arrive June 9th."

Why this matters for security: Because the LLM never directly executes code, you have full control over what tools exist, what they can do, and how their output is validated before going back to the model. You're the gatekeeper at every step.

Two Kinds of Memory (and Why You Need Both)

Most agents need at least two layers of memory working together:

Memory Type	Lifespan	What It's For	Typical Storage
Short-term (session)	Current conversation only	What was said in this chat, current task state	Redis cache, in-memory
Long-term (persistent)	Across all sessions	User preferences, past decisions, account details	Cosmos DB, database

One thing worth spelling out: maintaining conversation history is not the same as having memory. If you just re-send the entire conversation every time, that's reprompting — it works for short conversations but burns tokens fast and eventually hits the model's context limit. Real memory means intelligently storing and retrieving what's relevant, when it's relevant.

Where Agents Live on Azure

On Azure, agents are built, deployed, and managed through Microsoft Foundry — a single platform that brings together model deployment, agent hosting, identity management, safety tooling, and observability. You don't have to stitch together five separate services; Foundry handles the integration layer for you.

That's where we go next. Once you've got the mental model of what an agent is, understanding Foundry is the natural next step — and it's where most of the AI-103 exam lives.

What Is an AI Agent? A Practical Explanation

Start With What You Already Know

The Three Things That Make an Agent an Agent

Stateful vs Stateless: The Practical Difference

Tokens: The Currency You're Spending

System Instructions: How You Shape Agent Behaviour

How Tool Calling Actually Works

Two Kinds of Memory (and Why You Need Both)

Where Agents Live on Azure