Most agent platforms hand you one super-agent and ask you to trust it. We don't.
The design I tried first was the popular one — a single agent with a long system prompt, a big set of tools, and a willingness to do anything the user asked. It worked great in demos. It failed within a week in production with a real SMB operator.
I'd already watched enough "agent gone rogue" stories from other platforms to know what failure looked like: a client notice fired without authorization, an invoice sent to the wrong party, an unauthorized trade, a "we're sorry" tweet posted on behalf of a brand at 3am. The super-agent design is the design that produces those headlines. Not because the model is bad — because the surface area is too large for any single system prompt to keep an agent in its lane.
Here's what I saw in my own SMB-operator run, and what we changed.
The single-agent failure mode
A property manager turns on the agent and asks: "Can you handle the inbox today?"
The single agent reads the inbox. It sees a maintenance request, a rent dispute, a fair-housing question, and a thank-you note. It tries to handle all four. It writes a maintenance dispatch that's missing the unit number, a rent reply that quotes a number it pulled from memory rather than the ledger, a fair-housing answer that's slightly off the law for that state, and a thank-you that's competent but no operator would ever want signed with the agent's name.
Different vertical, same shape: a therapist's-office manager runs the same experiment and the agent confidently produces an insurance verification using last year's deductible, a recall call to a patient who explicitly opted out, and a 1099 reply for a contractor that's mostly right but slightly off the law for that state. Or a publishing house's contracts inbox, where the agent answers a rights question with vibes instead of reading the actual term sheet on file. The lanes are different across operators. The failure shape is identical: the agent treated breadth as a license to act, and the operator now has four problems instead of zero.
The agent's batting average doesn't matter; the operator's trust does, and trust is a step function. One fair-housing slip, one HIPAA-adjacent miss, one bad answer on a contract that's about to go to court — and the agent's permission to act gets revoked, permanently.
What coordinator-first means
In a GroundPound team, one agent is the coordinator. The coordinator's job is not to do the work. The coordinator's job is to:
- Triage what came in
- Route each item to the right specialist
- Track what's outstanding
- Surface anything to the operator that the operator should see this morning
The specialists do the actual work. The specific roles depend on the industry template the operator picks — for a property-management team it's leasing, maintenance, rent collection, tenant relations, compliance; for a therapist's office it's intake, insurance verification, scheduling, recall outreach; for a publishing house it's rights, royalties, author relations, production. Different verticals, same shape: each specialist has a tight scope and a short tool list, each handles its own queue, and none of them touch each other's queue.
When an item arrives, the coordinator classifies it and hands it off. For a property team that's MAINTENANCE / RENT / LEASE / LEGAL / TENANT; for a therapist's office it's INTAKE / INSURANCE / SCHEDULE / RECALL / BILLING. The specialist reads the context, drafts a response, files an approval if the response is customer-facing, and posts back to the coordinator when done. The coordinator updates the operator dashboard.
That's it. No specialist freelancing into another lane. No coordinator drafting customer replies. Each agent does what its role says it does and asks for permission for anything else.
Why this works better than the super-agent
Fewer surface areas means fewer failure modes.A leasing agent can't accidentally send a fair-housing-bad message because it doesn't have the customer-facing-send tool. A scheduling agent can't accidentally send a patient communication because that capability belongs to recall, which is the role we trained on the practice's HIPAA-adjacent guardrails. The specialization is a security boundary, not just an org-chart conceit.
The coordinator is a chokepoint that an operator can monitor. Every handoff is logged. Every classification is reviewable. When something goes wrong, the trail is "coordinator received X, classified as Y, routed to Z" — three things to look at, not a 50-step LLM trace.
Specialists are cheaper.A coordinator running a precision model can route 200 items in the time it takes a super-agent to fumble through 20. The specialists run on whatever's appropriate for their work — sometimes a fast cheap model, sometimes a careful expensive one. We let the routing decide. (More on that in the next post.)
Operators learn the team."Cap routes; Scout handles leases; Otis dispatches maintenance" — or "Eva triages, Maya verifies insurance, Lou handles recalls" — is a mental model an operator can hold after 10 minutes. "There's an AI" is not a mental model. It's a hope.
The hard part: making handoffs visible
The thing that breaks naively-designed multi-agent systems is invisible handoffs. Agent A "thinks" it told Agent B something. Agent B "thinks" it heard something different. They both proceed. The operator sees the wreckage.
In GroundPound, every handoff is an explicit row in the database with a payload, a classification, and a recipient. The receiving agent reads the handoff explicitly. If the payload is malformed or missing required context, the handoff fails closed — the work stays in the coordinator's queue until the operator decides what to do.
We learned this the hard way: an earlier version let agents call each other via function-call passthrough, and it broke in ways that took us a week to debug. The explicit-handoff design is slower in the happy path. It's dramatically safer in the unhappy path. We chose safety.
Where this falls apart (and what we're doing about it)
Coordinator-first is not free.
The coordinator becomes a single point of failure for routing quality. If the coordinator's classification is wrong 5% of the time, the right specialist gets the wrong item 5% of the time. We've shipped two responses to this: (a) the coordinator's classification is auditable and operator-editable; (b) misrouted items get reflected as a misclassification signal that improves the model's policy over time.
The other failure: specialists can't always know what other specialists are doing. We're shipping a shared context store (the team-scoped reflection log) so a leasing agent can read "the maintenance agent dispatched a vendor at this unit yesterday" before sending a tour confirmation, or so a recall agent for a vet clinic can see "the front-desk agent already left a message for this client this morning" before placing another call.
What we believe, restated
A team of specialists with a coordinator is how good ops works at human scale. It's also how good ops works at agent scale, for the same reasons: lanes, accountability, and a single throat to choke when something goes wrong. We didn't invent this design. We just refused to skip it because the demo looks better with a super-agent.
If you want to try it, start free. Tell the wizard what you do in a sentence — "I run a 3-therapist practice", "I dispatch four lawn-care crews", "I edit a small literary quarterly" — and 90 seconds later you'll have a coordinator and 4–6 specialists wired for your work. Watch how the coordinator routes the first inbound message. That's the design choice this post is about.