A single agent can only be so good. Cram too many responsibilities into one system prompt and you get a generalist that does everything mediocrely. Multi-agent systems let you build genuine specialists that collaborate — but only if you pick the right coordination pattern. This topic covers roughly 35% of the AI-103 exam, so it's worth getting deeply comfortable with all three approaches.
Why One Agent Isn't Enough
Imagine building a customer support agent that handles billing questions, technical troubleshooting, refund processing, and account management — all in one. The system prompt alone becomes an unwieldy document. The tool list exposes every capability to every request. The model struggles to stay in character when switching contexts mid-conversation.
Multi-agent design solves this by giving each agent a narrow, well-defined responsibility with exactly the tools it needs for that responsibility. An agent that only processes refunds carries only refund tools. An agent that only does technical support carries only diagnostic tools. Each is better at its job than any generalist would be.
The Three Patterns at a Glance
| Pattern | Who controls the flow | Conversation history transferred? | Best used for |
|---|---|---|---|
| Magentic (Manager) | Central manager agent | No — sub-agents get only the current request | Independent, routable tasks |
| Handoff | Sending agent explicitly transfers | Yes — full context package | Conversations that span multiple domains |
| Group Chat | Speaker selection algorithm | All agents see everything | Collaborative analysis or multi-perspective problems |
Pattern 1: Magentic (The Manager)
One central manager agent receives every incoming request, decides which specialist should handle it, and delegates. The manager never answers questions itself — its only job is routing. Sub-agents have no awareness of each other; from their perspective, the manager is the only entity they interact with.
manager_prompt = """You coordinate specialist agents. Never answer user questions directly.
Route each request to the appropriate agent:
- Billing questions, invoices, payments → billing_agent
- Technical errors, setup issues → tech_agent
- Returns, refunds, order changes → returns_agent
Respond only with the agent name and a brief routing reason."""
# Sub-agents have NO awareness of each other
billing_prompt = """You handle billing and payment questions only.
If asked about anything outside billing, say you can only assist with billing matters."""
Use Magentic when requests are self-contained — a user asks one question, gets one answer from one specialist, and the conversation either ends or the next question routes independently. No session state needs to cross agent boundaries.
Pattern 2: Handoff (Full Context Transfer)
Handoff is for conversations where context must follow the user across domain boundaries. The sending agent explicitly passes its entire conversation state — full history, session memory, user preferences, pending tool results — to the receiving agent. From the user's perspective, there's no disruption.
Here's the crucial implementation detail: handoff is triggered by a tool call. The LLM outputs a handoff_to_billing tool call in its response, and your router code intercepts that as a transfer signal — not as an actual function to execute. And when a tool call is made, message.content is None. Always check message.tool_calls first.
import json
# Handoff tool definition — the LLM "calls" this to signal a transfer
HANDOFF_TOOL = {
"type": "function",
"function": {
"name": "handoff_to_billing",
"description": "Transfer to billing agent when user asks about invoices or payments.",
"parameters": {
"type": "object",
"properties": {
"reason": {
"type": "string",
"description": "Why billing agent is needed"
}
},
"required": ["reason"]
}
}
}
def process_message(agent, user_msg: str, history: list):
response = agent.call_llm(
messages=[{"role": "system", "content": agent.system_prompt}]
+ history
+ [{"role": "user", "content": user_msg}],
tools=[HANDOFF_TOOL]
)
# CRITICAL: message.content is None when a tool call is made
# Always check tool_calls FIRST
if response.choices[0].message.tool_calls:
args = json.loads(
response.choices[0].message.tool_calls[0].function.arguments
)
# Build updated history for the receiving agent
updated_history = history + [
{"role": "user", "content": user_msg},
{"role": "assistant", "content": f"Transferring: {args['reason']}"}
]
return None, {"target": "billing", "history": updated_history}
# Normal text response
return response.choices[0].message.content, None
Pattern 3: Group Chat
Multiple agents share a single conversation. Every agent sees every message. A speaker selection algorithm evaluates the current state of the conversation and each agent's system instruction to decide whose turn it is next. There's no manager — the algorithm orchestrates participation.
This works well for problems that benefit from multiple perspectives — a product planning session where a designer, an engineer, and a business analyst all contribute in turn, or a code review where separate agents handle security, performance, and correctness independently.
Sequential vs Concurrent Execution
Regardless of which coordination pattern you choose, individual sub-tasks can run one at a time or in parallel.
import asyncio
# Sequential: step B needs step A's result
async def pipeline(query: str) -> str:
summary = await summarise_agent.run(query)
return await translate_agent.run(summary) # Needs summary to exist first
# Concurrent: independent tasks run simultaneously
async def parallel_analysis(query: str) -> dict:
sentiment, entities, topics = await asyncio.gather(
sentiment_agent.run_async(query),
entity_agent.run_async(query),
topic_agent.run_async(query)
)
# Sequential: 3 agents x 600ms = 1,800ms total
# Concurrent: max(600ms, 600ms, 600ms) = 600ms total
return {"sentiment": sentiment, "entities": entities, "topics": topics}
Reasoning Loops: What They Are and How to Fix Them
A reasoning loop is when an agent calls the same tool repeatedly with the same parameters, never making progress. It looks like an infinite loop from the outside — the agent keeps asking for the same data because it never gets what it was expecting.
In Foundry Trace, a reasoning loop appears as a series of identical spans back-to-back — same tool name, same parameters, over and over. The root cause is almost always a missing field in the tool's response that the LLM was counting on to move forward.
Fix: update your system instruction to handle the case where that field is absent — "if order_status is not returned, tell the user you couldn't retrieve the status and offer alternatives." Give the LLM a path forward when the tool doesn't return what it expected.