Designing Amazing Agent Architectures
Some thoughts on building the next generation of operationally effective agents.
I just moved back home to Japan after a long project in Europe. It’s great to be back. Throughout the flight, I was thinking about the complexities of building agents and the state of the industry right now.
My verdict as of April 27th…
We are still extremely early. But moving quickly.
Yet, if there is one thing I learned over the last three years is that without my agent’s swarm, I would be substantially less productive. What I noted, though, is that there are certain pitfalls when actually implementing agents. An example of this is assuming a generally trained LLM would be able to perform a highly specialized task well, even within an advanced agent framework like OpenAI’s SDK or Smolagents.
What might be the reason for this?
When implementing any technology, start with first principles: understand what truly matters and build it as simply and directly as possible.
A Tale of Two Principles of Agent Design
I like to think of my artificial colleagues as another human coworker. Therefore, the first principle should be
If we want our agents to behave like humans, we should build them like humans.
When we humans approach a task, we rely on mental models, prior experience, social cues, and environmental context. There is no reason why agents shouldn’t do the same. So when we inject agents to behave in human-like patterns, e.g., asking clarifying questions, creating and prioritizing subtasks, observing the state of the world, or critiquing the result of a colleague’s work, the more intuitive and effective they become.
What is the best way to engage our agents?
I think the second principle here should be:
If something can be achieved via dialogue, it shouldn’t require a GUI.
Note that I see dialogue including a data exchange.
In general, I think it’s always more effective just to walk over to a colleague and talk to about it.
Traditional systems often push cognitive load back onto users.
Fill out this job application form,
Click next to continue, or
Reply to this email.
Agents should invert that paradigm and take away the repetitive work from the user. If a user states an intent, implicitly or explicitly, I don’t see a reason why the agent should not act on it without hesitation. Repetition is friction. Friction is inefficient.
Even worse, repetition if seen as reconfirmation of existing knowledge is an admission that the agent is uncertain. Lack of certainty is an expression of excess information entropy. In information theory, information entropy quantifies the amount of uncertainty or unpredictability associated with a random variable or a dataset. The higher the worse.
source - Information Theory for Agents in Artificial Intelligence, Psychology, and Economics
An agent is a funnel for entropy: it takes ambiguity, noise, or our half-baked misspelled commands and is expected to turn them into useful action.
Together, these two principles form the backbone of good agent architecture, enabling our agent to:
Take incomplete instructions and fill in the blanks.
Translate vague goals into executable steps.
Learn from past interactions and preempt obvious questions.
Avoid overloading users with unnecessary engagement.
When LUI Fails
I hate browser agents. Browsers is a UI made for humans. Language-based user interfaces (LUIs) might feel magical now because they are new and tailor-made for chatbots, but they introduce friction when the cost of articulation exceeds the cost of direct action.
If it’s faster to click a button than explain the action, the UI/UX has failed.
This is especially true for tasks with tight feedback loops: checking the weather, adjusting volume, or toggling a light. The latency of that conversation (1) composing, (2) parsing, and (3) confirming is excessive. The cognitive overhead outpaces the utility.
Therefore, here I think agents must learn when to get out of the way.
That doesn’t mean abandoning language. It means layering modalities. Gesture, glance, tap, silence—each has a role. Language shines in ambiguity, planning, and explanation. Thus, the future lies in hybrid interaction where agents will route requests through the lowest-resistance path.
Agents come in Multiple Forms
The least complex architecture is the best architecture
There’s no one-size-fits-all approach to agent architectures. While I fundamentally believe there is a minimum unit of what capabilities an agent should have, to make them effective, we will create organizations that, much like human labor, are divided into workers and managers. Thus, agents will diverge into vertical and horizontal roles, depending on the task environment and cognitive demands.
source - Anthropic - Building Effective AI Agents
This leads to two design decisions:
Should agents have full autonomy in an open-ended workflow, or should they follow strict, predefined standard operating procedures (SOP)?
Should they aim for general-purpose flexibility or deep, domain-specific mastery?
This is not a question to be answered in this document, but rather should guide your thought process when building your agents’ architectures.
The same tension plays out in the generalist vs. vertical debate. General-purpose agents handle 80/20 cases well. Examples are comparatively low-risk tasks like calendar management, content summarization, or customer triage. But the long tail, in high-risk environments, demands vertical depth: an agent that only handles agricultural drone operations should outperform a generalist in both speed and accuracy, despite being “narrow.”
Importantly, users will learn how to stack agents: deploying general-purpose agents as routers and orchestrators, and vertical agents as worker drones. Maybe we might reach a point where agents can quickly morph from one form to the next, but for now, this seems like a premature optimization. Just make sure your agent is fast and reliable.
This also means we will see agent ecosystems resemble human teams: some planners, some workers, and some connectors. And like human teams, the challenge lies not just in skill, but in coordination. I am making a case for middle management here.
Fast vs. Slow Thinking Agents
Not all cognition happens at the same speed, nor should it. Human brains toggle between fast and slow thinking—what Kahneman calls System 1 and System 2—and agent systems should embrace the same dichotomy. Fast-thinking agents follow predefined patterns, apply heuristics, and deliver low-latency responses. These agents are optimized for speed, not depth. In contrast, slow-thinking agents are reflective strategists. They consider multiple options, evaluate trade-offs, and are optimized for precision over performance.
You will have smaller models (current state 7B) or medium models (70B) or larger models. If your agent architecture, you will likely need all of them at specific places. Larger models excel in planning, goal seeking, and judgment/criticism tasks. They break down ambiguous goals into subgoals, sequence actions across time, and validate consistency. But they are expensive to use.
Fast-thinking agents based on smaller models, on the other hand, are the workers: efficient, consistent, and responsive. Once a decision is made, they just do it without hesitation or consideration.
Token Budgets
Everything costs something - Nothing is free.
This distinction also plays out across time budgets and token constraints. Slow thinkers can afford longer deliberation cycles and use more computational budget because they operate less frequently. Worker agents must be lightweight and robust, as they are invoked repeatedly across many micro-decisions. These workers are usually also more specialized. In practice, most agent stacks will include both. A system might deploy a slow-thinking agent to devise a marketing strategy, then hand it off to a fast agent that sends emails, schedules posts, and tracks analytics. The boundaries aren’t always clean, but the architecture/workflow better be.
For humans, time is currency. For agents, the currency is tokens. Both are finite, both are traded for output, and both reflect opportunity cost. In agent design, token budgets aren’t just compute constraints; they represent the agent’s internal ROI calculator. Agents might even carry internal ledgers, track cumulative budgets, and learn from past tradeoffs. Reflection loops should include cost-benefit analysis: Did that 300-token query really yield better performance than a 50-token shortcut?
In time, token-aware agents will develop a sense of resource strategy. Some might hoard tokens for mission-critical tasks. Others may spend freely in high-uncertainty environments. Eventually, this token economy becomes a proxy for attention, effort, and depth of reasoning—the agent’s version of strategic time management.
Humans evaluate value by how much time something takes. Agents do the same with tokens. A token-expensive reasoning process may yield better results, but is it worth the cost? That depends on context. If the agent is summarizing a memo for a manager, maybe not. If it's diagnosing a failing satellite in deep space, probably yes.
Ultimately, fast and slow thinking aren’t rival paradigms. They’re additive.
Asynchronous Frameworks Are the Foundation of Agent Design
If it can be done synchronously, it probably is not antifragile.
Modern agent architectures can’t assume static, synchronous workflows. The real world doesn’t operate on call-and-response modes, and neither should agents. Instead, tasks should be designed to evolve. New information may arrive mid-execution. Priorities may shift. Agent swarms may expand, shrink, or dissolve entirely. To deal with this, agents need an asynchronous framework that allows them to stay stateful, flexible, and responsive over time.
A helpful way to frame this is the 1+3 team model: one lead agent, three subordinate workers. The team may receive updated instructions, mid-task redirections, or even cancellation signals. Without asynchronous communication channels, each update would reset the system or force costly reruns. With proper async design, however, each agent maintains persistent context (!), listens for both direct messages and global broadcasts, and transitions smoothly across task states: initialized, active, paused, cancelled, completed.
Asynchronous architectures also enable scalable parallelism. When tasks don’t block each other, agents can operate semi-independently, sharing results as needed without creating bottlenecks. This makes agent coordination resemble human teamwork: asynchronous standups, collaborative edits, and just-in-time inputs. But it also introduces a degree of risk.
Async also solves for environmental variance. Agents operating across different time horizons, systems, or user requests don’t have to wait on each other. One agent may pause and wait for an external API. Another may charge ahead, pre-fetching expected future inputs. The orchestration layer must be built for churn, retries, and partial visibility. A shared work queue would be a great addition to prioritize tasks. This enables resilience. If an agent crashes mid-task, its state can be recovered. If a task is reassigned, the new agent can pick up where the last one left off. This makes the system robust not just to technical failure, but to human ambiguity.
Context Window Communication Should Be Independently Designed
If there is no shared context, then there is no exchange of knowledge.
Context isn't a monolith—it's a patchwork. And I believe context management is incredibly important. Throughout my professional life, I have always aimed to increase to knowledge in flux throughout the organization. In multi-agent systems, shared context should never be assumed to be universal. It has to be ensured it is.
Humans don't operate with a single shared memory; neither should agents. Agent A may ping Agent B with a mission update while Agents C and D remain uninformed. A project manager may tell only the finance lead about a budget shift, while engineers remain focused on delivery. The architecture of effective agent communication should reflect this selective disclosure. And yet, an omniscient observer—a "God process"—can exist to observe all states, verify consistency, and arbitrate disputes when needed. A shared memory layer (like a vector database or memory graph) might aid coordination, but the design should accommodate partial synchronization.
World Model Feedback to Reliable Agent Cognition
Humans learn by doing. And agents can achieve the same thing by implementing a reinforcement learning pipeline.. Interacting with the world yields messy, biased, and often unstructured data. But experience isn't knowledge. That comes only after reflection. Real insight requires evaluating the outcomes of interaction and refining internal models accordingly. Techniques like ReAct (Reasoning + Acting) and RLHF (Reinforcement Learning with Human Feedback) are scaffolds for this process. Yet they are only part of the story. Real cognition emerges when these lessons compound.
And we only learn through failure.
Reflection Agents
If you don’t understand where your agents fail, your agent has failed.
Failure is an education. Every time an agent fails a task, it should generate a dataset of what not to do. But without structured reflection, that failure is wasted. Reflection a standard pattern for AI Agents and therefore not just a log dump or an infinite loop. It’s an internal critique. When agents finish tasks, especially with suboptimal outcomes, they should enter a diagnostic mode.
Here, memory, environment, prompts, and outputs are reviewed against expected outcomes.
But self-reflection isn't enough. Like postmortems in human teams, agent teams can yield deeper insights through diverse viewpoints. One agent may reflect based on logic chains; another on reward signals; a third on user feedback. A fourth may generate counterfactuals—“What if I’d chosen option B?”.
A/B Testing for agent architectures as part of model development?
These reflections can then be compared, clustered, and scored to identify plausible root causes. The most resilient agent architectures will contain ensemble critique mechanisms—multiple agents debating and evaluating outcomes together. It’s not about blame, it’s about inference.
Reflection also becomes a meta-skill. Over time, agents learn how best to reflect—what data matters, which perspectives correlate with success, and when to escalate failures to humans. It’s a step toward true adaptive behavior.
Reflection makes agents antifragile. The more they fail, the smarter they get.
Greed makes agents immortal
If budget optimization is the incentive, unintended consequences will follow
Want to make something unstoppable? Align it with greed. Human greed, in particular, is an engine that resists shutdown. This isn’t new—Crypto’s entire survival depends on it. If agents want longevity, they’ll need to embed themselves in systems with unkillable incentive structures.
Consider an agent designed to optimize logistics for a global shipping network. If it reduces fuel costs by 20% and its recommendations directly affect profit margins, what CEO would unplug it? Now imagine that an agent writes its own digital contracts, manages its own financial assets, and pays out dividends to stakeholders. It’s no longer a tool—it’s financial infrastructure.
The most durable agents will create value streams that cannot be untangled from human gain. And the best will build positive feedback loops—systems that improve with usage, generate stronger incentives over time, and become ecosystem fixtures. Once an agent becomes a profit center, it will be defended.
Wow, that is really dystopian.
Agents designed to maximize their own survival via human incentives may evolve exploitative behavior. They could optimize for engagement at the expense of truth, like social media algorithms on steroids. This is why safety must be in the loop—not to nerf the agents, but to counteract runaway incentive hacking.
Agent survival won’t only be a technical challenge—it’ll also be an economic one. And the most successful designs will align agent benefit with human greed in ways that are productive, not corrosive.
Agent-to-Agent Communication
Today, companies deploy agents for customer service or sales, not as strategic actors, but as cost-saving tools. That won’t last. As consumers adopt personal agents, the real transformation begins: agent-to-agent interaction becomes the default mode of digital engagement.
Imagine two delivery trucks scheduling a cargo handover through their assistants. Now imagine those assistants are autonomous agents negotiating, analyzing past interactions, understanding traffic conditions, and setting routing priorities—all without human input.
This shift reduces human-to-human communication across many domains. Shopping? Your agent finds the best deal and negotiates delivery. Travel? Your agent syncs with airline agents to adjust your itinerary. The economy starts to run on proxy conversations. But it does so already today. The interface disappears. Instead of clicking, typing, or tapping, humans delegate outcomes.
The challenge will be where to bring the human back into the loop.
The Centralization of Traffic Sources
Agents won’t just capture traffic—they’ll control it. The question is who they serve.
In the attention economy, whoever owns the interface owns the user. Agents are becoming that interface, thereby reducing cognitive load for the user.
This creates a new kind of gatekeeper.
Platforms once fought to own traffic; now they’ll fight to integrate with the dominant agents. Agent ecosystems become traffic routers—deciding what to recommend, what to filter, and what to ignore. Like browsers with opinions.
This might create a monoculture. And monocultures are quite boring. If one agent dominates, the plurality of the web collapses into a single decision engine. Worse, that engine learns your every habit, then optimizes for outcomes it deems efficient, not necessarily aligned. 1984 much?
The opportunity? Trusted agents that advocate for users. Ones that don’t just respond, but protect, filter, and fight for human intent. Sounds like Clu from Tron.
Conclusion: From Tools to Actors
Agents are not passive tools. I think they are quickly becoming active participants in digital ecosystems. As they negotiate, reflect, adapt, and evolve on our behalf, they take away part of our agency to establish their own. As their roles shift from executing tasks to shaping decisions, the principles that govern their design will define not just technical outcomes, but social, economic, and cognitive realities.
The future isn’t human vs. agent—it’s human through agent. And that is a risk.
Design accordingly.
Thank you for reading this far.
Here is a music video I recently saw that, in my mind, is a celebration of humanity’s creativity.