How to Organize Agents — Hierarchy, Graph, Swarm, Routing, and Skepticism

How do you organize the sliced agents. Four structures, paired with one skeptical paper that argues 'LLM swarms aren't really swarms.' The conclusion lands where it usually does — structure choice gets dragged along by pricing. Part 2 of the series.

Organization follows decomposition

In part 1 the question was what to slice. Role, Skill, Time, Judge, Planner-Executor. Part 2 is the next question. How do you organize the sliced units?

If decomposition runs along five axes, organization runs along four. Hierarchy, Graph, Swarm, Routing. The Q2 2026 papers refine, criticize, or release new versions of each.

On top of these four structures sit three observations. More capable models drive more complex systems. The LLM swarm framing is not actually swarm. And structure choice is not a free variable but a dependent variable of pricing and capability. The last observation points in the same direction as the part 3 anchor: Advisor is a price tag, not architecture.

Four structures at a glance

Structure	Organizing principle	Representative work	Precondition
Hierarchy	Higher supervises and delegates to lower	A Taxonomy of Hierarchical MAS (arxiv 2508.12683)	Clear supervisory chain
Graph (DAG)	Predefined nodes and edges	From Agent Loops to Structured Graphs (arxiv 2604.11378)	Decomposable task
Swarm	Distribution, emergence, scale	LLM-Powered Swarms: A New Frontier or a Conceptual Stretch? (arxiv 2506.14496)	Three principles, contested
Routing / MoE	Per-query model selection	xRouter (arxiv 2510.08439)	Wide cost-performance gap across the model pool

The four are not exclusive. A single production system often uses Hierarchy at the top, Graph in the middle, and Routing at the leaves — a hybrid setup.

Hierarchy — supervision and delegation

A Taxonomy of Hierarchical Multi-Agent Systems (arxiv 2508.12683) is a taxonomy paper for hierarchical MAS. It systematically classifies who supervises whom, the direction of delegation, and how escalation is handled.

The core of Hierarchy is a structure where a higher agent assigns tasks to lower agents and reviews the results. Familiar from AutoGen and MetaGPT. The organization most aligned with role-based decomposition.

The strength is interpretability. Draw the org chart and responsibility is visible. This makes it easier to meet audit and regulatory requirements during enterprise rollouts. The weakness is a bottleneck at the top. If the top stalls, everything below waits. Parallelism is also limited because Hierarchy is fundamentally a synchronous supervisory structure.

The next structure tries to escape this limitation.

Graph — organized as a DAG

From Agent Loops to Structured Graphs (arxiv 2604.11378) shows the most current thinking from April 2026. It reframes agent execution from a repeating loop to scheduling over a structured graph.

The shift is concrete. Instead of an agent prompting itself for “what next?” at every step (loop), it follows predefined nodes and edges (DAG). Effectively a theoretical write-up of what LangGraph has done in production since 2024.

Why Graph went mainstream in 2026 is simple. Loops are hard to debug. The agent can take different paths each run, so reproducibility on the same input is poor. DAGs have finite paths, reproducible from logs.

For enterprise rollouts the difference is decisive. In regulated domains (finance, healthcare), where “why was this decision made?” must be explainable, DAGs allow it. Loops mostly do not.

The weakness is having to define the Graph in advance. New tasks need new graphs. Graph design cost accumulates.

Swarm — distribution, emergence, and skepticism

LLM-Powered Swarms: A New Frontier or a Conceptual Stretch? (arxiv 2506.14496) is the most provocative paper in this series. The title is itself a question. Are LLM swarms a new frontier, or a stretch of the term?

The authors propose three principles from classical swarm intelligence as the test.

Principle	Definition	LLM swarm satisfaction
Decentralization	Agents decide locally without central control	Mostly fails (a central coordinator is required)
Emergence	System-level behavior not programmed at the individual level	Mostly fails (outputs are deterministic)
Scalability	Natural extension to hundreds or thousands of agents	Mostly fails (communication and token cost grows multiplicatively)

The conclusion is direct. Most systems currently called “LLM swarms” satisfy none of the three. They are systems that run multiple agents in parallel with the swarm label attached.

The value of this paper is more in the methodology than the conclusion. “Swarm” in the AI field has mostly been consumed in marketing contexts. When concepts loosen, criteria for judgment disappear. This paper rebuilds the criteria. It provides three questions for testing whether something is a swarm. Does it operate without a central coordinator? Does it produce system-level behavior not programmed at the individual level? Does it scale to hundreds?

From 2026 onward, when an “AI swarm” appears in a proposal or product description, these three questions can be applied. Most fail.

The rare environment where swarm actually applies is embedded. Online Automatic Code Generation for Robot Swarms (arxiv 2510.04774, 2025-10) documents tens of robots autonomously executing LLM-generated code with swarm behavior. With physically distributed hardware, the principles hold. LLM text agents alone have a hard time meeting the three. The point reinforces.

Routing — pick a model per query

xRouter (arxiv 2510.08439) trains cost-aware routing through reinforcement learning. Based on the characteristics of an incoming query, it picks the optimal model from the pool (small to large).

The organizing principle borrows from MoE (Mixture of Experts). Instead of one large model, several specialized models, with one chosen for the situation. The difference here is that “specialization” comes from model size, price, and domain.

Routing rose in 2025 to 2026 because the price gap across models grew. Opus and Haiku now sit roughly 19x apart. When the gap is small, routing is not a strong lever. When the gap is large, routing becomes the strongest cost-optimization tool.

In 2022 to 2023, models were close in price. There was little reason to route. From 2025, model lineups segmented by price tier and the economics of routing held up.

The limitation of routing is meta-decision cost. The router that decides which model to use is itself a model. If the router is too smart, the cost lever disappears. Too simple, and misjudgments mount. Finding that balance is what xRouter-style research is about.

Four structures in one diagram

flowchart TB
    QUERY["Query / task"]
    QUERY --> H["Hierarchy: supervision"]
    QUERY --> G["Graph: predefined DAG"]
    QUERY --> S["Swarm: distribution"]
    QUERY --> R["Routing: per-query selection"]
    H --> H1["Strength: explainable"]
    H --> H2["Weakness: bottleneck"]
    G --> G1["Strength: reproducible"]
    G --> G2["Weakness: requires upfront design"]
    S --> S1["Strength: scale potential"]
    S --> S2["Weakness: fails three principles"]
    R --> R1["Strength: cost lever"]
    R --> R2["Weakness: meta-decision cost"]

The four do not exist in isolation in real systems. Each carries different preconditions and limits. None is “best” in absolute terms. Optimal structure shifts with task properties, regulatory requirements, and model pricing.

Three observations follow.

Capability up, complexity up

The most common misreading goes like this. As the model gets smarter, the system will get simpler. One strong agent will do many things.

The 2026 reality runs the other way. Claude Opus 4.7 offers 1M context and stronger reasoning, but the systems that use it are moving toward more complex orchestration. Advisor, Plan-and-Act, and xRouter all push in the direction of “do not let one model do everything.”

Three reasons compound. Pricing leverage grew. As Opus gets smarter, its unit cost is also high. Running Opus at every step is unaffordable. The fix is to cut things into pieces and use Opus only on some. More cuts means more complexity.

Slack for finer slicing appeared. Weaker models could only handle large-chunk tasks. Stronger models can be cut into smaller units while each piece still produces meaningful output. Finer-grained decomposition becomes possible.

Evaluation and audit demand rose. As models entered production, “why was this decision made?” became a frequent requirement. A single black-box agent cannot answer it, so steps get sliced and each step records its rationale.

The signal for decision-makers is clear. The expectation that “introducing a better model will simplify the system” runs counter to 2026 reality. Introducing a better model requires the team to be ready for more complex orchestration design.

For consulting work, model upgrade and orchestration design resourcing should always be paired in the proposal. Replacing the model while leaving the structure unchanged tends to spike cost or fail to deliver the expected gain.

LLM swarms are not swarms

Restating the central claim of the LLM-Powered Swarms paper: “swarm” classically refers to systems that satisfy decentralization, emergence, and scalability. Most current “LLM swarms” satisfy none.

For practitioners, this skepticism is highly practical. From 2026 onward, when an “AI swarm” proposal arrives, the diligence questions are clear. Is there a central coordinator? If yes, not a swarm. Do individual agents produce system-level behavior not programmed at the individual level? If outputs are deterministic, not a swarm. Does it scale to a thousand, not just ten? If communication cost grows multiplicatively, not a swarm.

If a system fails the three, it is a marketing variant of Hierarchy or Graph. Not bad in itself. But it cannot deliver the properties (scalability, resilience) that swarm promises.

For AI strategy, this distinction has financial weight. A system marketed as swarm but actually Hierarchy will see costs grow multiplicatively at scale, well beyond projections.

Structure is not an independent variable

The closing observation connects directly to the part 3 anchor.

The four structures (Hierarchy, Graph, Swarm, Routing) appear to be independent options. As if you could decide “Graph this time” based on the task. In reality, no. Each structure is optimal only under specific pricing and capability conditions.

Structure	Optimal condition
Hierarchy	Wide capability gap between top and bottom models
Graph	Task is decomposable in advance, regulatory or audit requirements present
Swarm	Many low-cost devices distributed physically
Routing	Model pool spans pricing differences of 5x or more

If any of these conditions changes, the optimal structure changes too.

If the Opus-Sonnet price gap narrows, Routing’s lever weakens. xRouter-style approaches lose ROI. If Sonnet-tier models reach Haiku pricing, the “expensive coordinator + cheap workers” Hierarchy loses meaning. If everything sits in the same price band, the supervisory layer no longer justifies its cost. If hardware-side AI agents become widespread (indoor robots, etc.), Swarm’s three principles finally have a real-world environment in which to hold.

Structure choice is a function of economic and physical preconditions. There is no guarantee that today’s optimal structure remains optimal twelve months out. This is the extended version of the part 3 anchor’s “pricing, not architecture.”

For enterprise rollouts, separate structure-locked investments from structure-flexible ones. Structure-locked: custom implementation tuned to one structure. Higher short-term performance, longer-term lock-in risk. Structure-flexible: implementation on a generic framework like LangGraph or LlamaIndex. Lower cost to change structure.

In 2026, structure-locked has a real performance edge. Given how fast pricing and capability conditions shift, structure-flexible carries the higher expected value.

Closing — the shape follows the price

This piece is part 2 of the series. Four structures from 2026 agent orchestration research, with three observations layered on.

Each of the four has a precondition: a chain of supervision, predecomposed tasks, distribution, a wide price gap. Capability gains drive complexity gains. Better models do not produce simpler systems; they go the other way. LLM swarms are not swarms. Run the three-principle test and most fail. Structure is not an independent variable. As pricing and capability conditions change, so does optimal structure.

Part 1 covered five axes for slicing. This piece covered four structures for organizing. The part 3 anchor already argued that all of this converges on one meta-move: reversal.

Read the three together and the 2026 agent orchestration landscape resolves into a single picture. Decomposition spreads across five axes, the vocabulary remains fragmented, the time axis is empty. Organization splits across four structures, each tied to pricing and capability conditions. The overall direction is the reversal of traditional role layouts, and at its center the Advisor pattern is a price tag rather than architecture.

With this lens, reading the rest of 2026 and the first half of 2027 becomes much less surprising. Most “new orchestration patterns” fall into one of three buckets: flipping a remaining dichotomy, responding to changes in the pricing environment, or filling the time-axis gap.

For a reader who knows these three buckets in advance, the silhouette of any new paper’s conclusion is visible before reading.

Series

Part 1 (slicing): How to slice an agent — five axes from 2026 research
Part 2 (this piece): How to organize agents — Hierarchy, Graph, Swarm, Routing, and skepticism
Part 3 (anchor): The Advisor Pattern Is a Price Tag, Not Architecture

References

A Taxonomy of Hierarchical Multi-Agent Systems: Design. arxiv 2508.12683 (2025-08)
From Agent Loops to Structured Graphs: A Scheduler-Theoretic Framework for LLM Agent Execution. arxiv 2604.11378 (2026-04)
LLM-Powered Swarms: A New Frontier or a Conceptual Stretch? arxiv 2506.14496 (2025-06)
Online Automatic Code Generation for Robot Swarms. arxiv 2510.04774 (2025-10)
xRouter: Training Cost-Aware LLMs Orchestration System via Reinforcement Learning. arxiv 2510.08439 (2025-10)