Multi-Agent Orchestration: Handoff, Magentic & Group Chat Patterns

A single agent can only be so good. Cram too many responsibilities into one system prompt and you get a generalist that does everything mediocrely. Multi-agent systems let you build genuine specialists that collaborate — but only if you pick the right coordination pattern. This topic covers roughly 35% of the AI-103 exam, so it's worth getting deeply comfortable with all three approaches.

Why One Agent Isn't Enough

Imagine building a customer support agent that handles billing questions, technical troubleshooting, refund processing, and account management — all in one. The system prompt alone becomes an unwieldy document. The tool list exposes every capability to every request. The model struggles to stay in character when switching contexts mid-conversation.

Multi-agent design solves this by giving each agent a narrow, well-defined responsibility with exactly the tools it needs for that responsibility. An agent that only processes refunds carries only refund tools. An agent that only does technical support carries only diagnostic tools. Each is better at its job than any generalist would be.

The specialist clinic analogy: A GP does many things adequately. A cardiologist does one thing exceptionally. When you need a heart procedure, you don't want the GP "giving it a go." Multi-agent systems are built on the same principle — specialists for specialist work, with a coordination layer routing between them.

The Three Patterns at a Glance

Pattern	Who controls the flow	Conversation history transferred?	Best used for
Magentic (Manager)	Central manager agent	No — sub-agents get only the current request	Independent, routable tasks
Handoff	Sending agent explicitly transfers	Yes — full context package	Conversations that span multiple domains
Group Chat	Speaker selection algorithm	All agents see everything	Collaborative analysis or multi-perspective problems

Pattern 1: Magentic (The Manager)

One central manager agent receives every incoming request, decides which specialist should handle it, and delegates. The manager never answers questions itself — its only job is routing. Sub-agents have no awareness of each other; from their perspective, the manager is the only entity they interact with.

manager_prompt = """You coordinate specialist agents. Never answer user questions directly.
Route each request to the appropriate agent:
- Billing questions, invoices, payments  → billing_agent
- Technical errors, setup issues         → tech_agent
- Returns, refunds, order changes        → returns_agent

Respond only with the agent name and a brief routing reason."""

# Sub-agents have NO awareness of each other
billing_prompt = """You handle billing and payment questions only.
If asked about anything outside billing, say you can only assist with billing matters."""

Use Magentic when requests are self-contained — a user asks one question, gets one answer from one specialist, and the conversation either ends or the next question routes independently. No session state needs to cross agent boundaries.

Pattern 2: Handoff (Full Context Transfer)

Handoff is for conversations where context must follow the user across domain boundaries. The sending agent explicitly passes its entire conversation state — full history, session memory, user preferences, pending tool results — to the receiving agent. From the user's perspective, there's no disruption.

Here's the crucial implementation detail: handoff is triggered by a tool call. The LLM outputs a handoff_to_billing tool call in its response, and your router code intercepts that as a transfer signal — not as an actual function to execute. And when a tool call is made, message.content is None. Always check message.tool_calls first.

import json

# Handoff tool definition — the LLM "calls" this to signal a transfer
HANDOFF_TOOL = {
    "type": "function",
    "function": {
        "name": "handoff_to_billing",
        "description": "Transfer to billing agent when user asks about invoices or payments.",
        "parameters": {
            "type": "object",
            "properties": {
                "reason": {
                    "type": "string",
                    "description": "Why billing agent is needed"
                }
            },
            "required": ["reason"]
        }
    }
}

def process_message(agent, user_msg: str, history: list):
    response = agent.call_llm(
        messages=[{"role": "system", "content": agent.system_prompt}]
                 + history
                 + [{"role": "user", "content": user_msg}],
        tools=[HANDOFF_TOOL]
    )

    # CRITICAL: message.content is None when a tool call is made
    # Always check tool_calls FIRST
    if response.choices[0].message.tool_calls:
        args = json.loads(
            response.choices[0].message.tool_calls[0].function.arguments
        )
        # Build updated history for the receiving agent
        updated_history = history + [
            {"role": "user",      "content": user_msg},
            {"role": "assistant", "content": f"Transferring: {args['reason']}"}
        ]
        return None, {"target": "billing", "history": updated_history}

    # Normal text response
    return response.choices[0].message.content, None

Magentic vs Handoff — the decision: Does the receiving agent need the conversation history to do its job? If yes, use Handoff. If the request stands alone, use Magentic. A user saying "now fix my billing issue" mid-conversation needs Handoff. A user asking an isolated billing question can go through Magentic.

Pattern 3: Group Chat

Multiple agents share a single conversation. Every agent sees every message. A speaker selection algorithm evaluates the current state of the conversation and each agent's system instruction to decide whose turn it is next. There's no manager — the algorithm orchestrates participation.

This works well for problems that benefit from multiple perspectives — a product planning session where a designer, an engineer, and a business analyst all contribute in turn, or a code review where separate agents handle security, performance, and correctness independently.

Sequential vs Concurrent Execution

Regardless of which coordination pattern you choose, individual sub-tasks can run one at a time or in parallel.

import asyncio

# Sequential: step B needs step A's result
async def pipeline(query: str) -> str:
    summary = await summarise_agent.run(query)
    return await translate_agent.run(summary)  # Needs summary to exist first

# Concurrent: independent tasks run simultaneously
async def parallel_analysis(query: str) -> dict:
    sentiment, entities, topics = await asyncio.gather(
        sentiment_agent.run_async(query),
        entity_agent.run_async(query),
        topic_agent.run_async(query)
    )
    # Sequential: 3 agents x 600ms = 1,800ms total
    # Concurrent: max(600ms, 600ms, 600ms) = 600ms total
    return {"sentiment": sentiment, "entities": entities, "topics": topics}

asyncio.gather result order: Results always come back in the same order as the input arguments — not the order tasks finished. This is consistent and predictable, and it's a detail the exam tests.

Reasoning Loops: What They Are and How to Fix Them

A reasoning loop is when an agent calls the same tool repeatedly with the same parameters, never making progress. It looks like an infinite loop from the outside — the agent keeps asking for the same data because it never gets what it was expecting.

In Foundry Trace, a reasoning loop appears as a series of identical spans back-to-back — same tool name, same parameters, over and over. The root cause is almost always a missing field in the tool's response that the LLM was counting on to move forward.

Fix: update your system instruction to handle the case where that field is absent — "if order_status is not returned, tell the user you couldn't retrieve the status and offer alternatives." Give the LLM a path forward when the tool doesn't return what it expected.

Multi-Agent Orchestration: Three Patterns That Work

Why One Agent Isn't Enough

The Three Patterns at a Glance

Pattern 1: Magentic (The Manager)

Pattern 2: Handoff (Full Context Transfer)

Pattern 3: Group Chat

Sequential vs Concurrent Execution

Reasoning Loops: What They Are and How to Fix Them