A couple of days ago I wrote about RSVG, Rules, Scopes, Validators, and Gates. I argued for building validation infrastructure around coding agents. I still believe that. But I've been thinking about what runs behind the harness: the agent architecture itself. And I'm increasingly convinced that often they are overcomplicated.
1. The Multi-Agent hype
Somewhere along 2024 and 2025 we fell in love with multi-agent systems. The pitch was compelling: if one agent is good, surely multiple agents working together must be better. We started building coordinator-executor patterns, specialist agents, debate frameworks, and elaborate orchestration layers.
Me too. When you've already thought about a problem for a while, you often have a clearly defined solution for it in your head, and most of the time, this solution involves breaking down the big problem into smaller parts. LLMs have given us super versatile problem solvers; just ask them to fix one of these subproblems, and you are one step closer to the solution. What you end up with is an orchestrated, rigidly defined linear chain that, if all steps are executed correctly, will give you the solution you already knew.
The intuition that "more agents = better" turns out to be surprisingly wrong in many cases.
2. The Research
A recent paper from researchers at UW, MIT, and Google ("Towards a Science of Scaling Agent Systems") puts numbers to idea. They ran 180 configurations across four benchmarks, testing five architectures: single agent, independent agents, centralized coordination, decentralized coordination, and hybrid approaches.
The findings are quite remarkable:
Capability saturation. Once a single agent can achieve roughly 45% accuracy on a task, adding more agents provides diminishing or even negative returns. The coordination overhead eats into performance.
Sequential reasoning degrades. For tasks requiring step-by-step reasoning, all multi-agent variants performed 39-70% worse than a single agent. The overhead of coordination actively hurts.
Error amplification depends on topology. Independent agents (no coordination) amplify errors 17x through unchecked propagation. Centralized coordination contains this to 4.4x—but that's still significant overhead.
The paper does identify cases where multi-agent coordination helps: parallelizable tasks like financial analysis, dynamic web navigation. But the default assumption that coordination improves performance is empirically false for most sequential work.
3. The Bitter Lesson Applied
Rich Sutton's Bitter Lesson argues that general methods leveraging computation beat hand-coded human knowledge every time. Phil Schmid recently applied this to agent harnesses: every assumption you bake into your infrastructure is a liability when the next model release changes the optimal approach. A super useful eye opener.
The lesson applies doubly to multi-agent architecture. All that coordination logic, the message passing, the role definitions, the orchestration, represents crystallized assumptions about what the model can't do on its own. And those assumptions keep getting invalidated.
4. Enter Ralph
Which brings me to Ralph. If you haven't encountered it yet, Ralph (named after Ralph Wiggum from The Simpsons) is about as simple as agent architecture gets. It's a loop. That's it.
The technique, originally a bash while loop from Geoffrey Huntley, is now an official Anthropic plugin for Claude Code. The core idea: instead of elaborate orchestration, just keep feeding the same prompt back until the agent completes the task.
/ralph-loop "Implement auth. Output <promise>DONE</promise> when tests pass."
--max-iterations 20Claude works on the task. Tries to exit. The stop hook blocks the exit if the completion promise isn't found. Same prompt fed back. Repeat until done.
The philosophy is almost aggressively simple:
- Don't aim for perfect on first try
- Let the loop refine the work
- Failures are informative—use them to tune prompts
- Keep trying until success
This feels wrong at first. Shouldn't we be smarter about this? Shouldn't we have specialized agents, planning phases, verification steps?
Sometimes. But often, no. The research backs this up. For sequential reasoning tasks, which you could argue, coding often is, a single agent iterating is more effective than multiple agents coordinating.
5. Reconciling with RSVG
Here's where I want to address a potential contradiction in my own thinking. I've written about validation infrastructure, gates, validators, rules. That's complexity too, isn't it?
I don't think so. There's a crucial distinction between architectural complexity and validation complexity.
Architectural complexity means adding agents, coordination layers, message passing, role definitions. This adds overhead to every iteration. The research shows this overhead often outweighs benefits.
Validation complexity means adding checkpoints that provide deterministic feedback. This enhances each iteration. A failed lint check teaches the agent something specific. A type error gives an unambiguous signal. The loop becomes more effective, not less.
Ralph loops and RSVG are complementary. Ralph provides the simple iteration mechanism. RSVG provides the deterministic signals that make each iteration productive. Together they cycle back to what has been written a 1000 times now: a single agent, running in a loop, with good validators to guide it.
6. What I need to rephrase
Looking back at my previous writing, I still stand by the core principles. Deterministic validation beats front-loaded instructions. The loop is where learning happens. Context is precious.
But I should put a big asterix on the architecture. Be sure you dont get tempted too quick to reach for sub-agents and coordinator patterns. The research and practical experience suggest a simpler mental model:
Start with one agent. Add validation to make each iteration count. Only add architectural complexity when you have clear evidence that parallelization will help, and that evidence should come from the task structure, not from intuition about what "feels" more sophisticated.
The sub-agent pattern still has its place. Context isolation is real.. when you need deep search or heavy reasoning that would pollute the main context, delegation makes sense. But it should be the exception, not the default architecture.
7. Closing Thoughts
There's something almost embarrassing about realizing that a bash while loop outperforms elaborate multi-agent architectures for most tasks. We like to think sophistication implies effectiveness.
But the research is pretty clear. Single agents beat multi-agent systems on sequential reasoning by large margins. Coordination overhead is real, general methods scale, hand-coded structure doesn't.
So for now I go with simple loops, good validation, minimal architecture.
Jan Willem Altink