From Spec to System: A2A and the Agentic Mesh
A Composable, Distributed, Vendor-Neutral Architecture Internet for Agents
Much has been written about agent architectures, but not much has been implemented yet. This post extends on my previous post on the basics of A2A. Proposed in a recent McKinsey study, the agentic AI mesh is an thought-experiment architecture presented to the board rooms of the world for integrating multi-agents into a single architecture. In short, it’s an Internet for Agents. In this architecture, agents can reason, collaborate, and act autonomously across a distributed mesh of systems and tools. This “mesh” is “composable, distributed, and vendor-agnostic,” enabling agents to work securely, at scale, and built to evolve with the technology.
source
First Principles Design
At its core, it all sounds really exciting as the mesh consists of five mutually reinforcing design principles:
Composability: Any agent, tool, or model (e.g., a new LLM or a specialized VLM) can be plugged into the mesh without rebuilding other components. For example, in robotics, the mesh could allow a new obstacle-detection algorithm to be integrated into a fleet of drones without rewriting the navigation stack. This ensures each part of the system is modular and reusable.
Distributed Intelligence: The mesh supports networked/distributed reasoning. As initially observed in my Autogen post, approaches like these increase operational complexity as complex tasks are decomposed and handled by specialized agents, rather than a single LLM. In a swarm robotics scenario, for instance, different robots (or “robot-agents”) can specialize (mapping, planning, manipulation) and share results, effectively distributing intelligence across the team. This mirrors multi-agent coordination and enables resilience: if one agent fails, others can pick up its workload.
Layered Decoupling: Mesh functions (such as logic, memory, orchestration, and interfaces) are cleanly separated into layers. In practice, this means an agent’s reasoning logic is decoupled from its data storage and from how users interact with it. For example, a supply-chain mesh might separate the planning engine from the database of inventory and from the human user interface. By isolating layers, teams can upgrade or replace one layer (say, a database or an LLM) without breaking the rest of the system.
Vendor Neutrality: No single vendor or platform controls the fabric of the mesh; every component can be independently updated or replaced. Crucially, the mesh favors open standards (e.g. the Model Context Protocol and Agent2Agent) over proprietary APIs, avoiding vendor lock-in (one of the risks, I am not tired to point out in my Survival Guide). For example, Google’s new Agent2Agent (A2A) protocol defines an open message format and discovery mechanism so that agents built on different frameworks or clouds can interoperate. Likewise, Anthropic Model Context Protocol (MCP) provides a universal interface for agents to fetch data from any source. By building on standards (think USB), the mesh can mix and match tools and models from multiple vendors (e.g., a chatbot agent from one company, a data-fetch tool from another) without rewriting glue code.
Governed Autonomy: Agents in the mesh act on their own, but within guardrails, embedded policies and other constraints. In other words, every autonomous action is pre-governed by rules. For instance, in a robotics context, a delivery drone agent might be allowed to deviate from a planned path only if it is below a speed threshold and outside restricted zones. These built-in guardrails (access controls, escalation triggers, audit logs) ensure agent behavior remains safe and transparent.
An example how that could look like using Google’s A2A can be seen below:
source
The A2A example highlights the aspect of vendor neutrality. Here, industry partners have already participated in the specification of a common agent-to-agent messaging standard that “allows AI agents to communicate with each other, securely exchange information, and coordinate actions” across their enterprise integrations. Btw, I worked out the messaging parts in this post.
Big if true!
In practice, though, an agent built by Atlassian may discover and invoke a specialist agent from Salesforce via the mesh, with identity and data securely handled by shared protocols. Similarly, MCP provides a plug‑and‑play data interface so that any agent can retrieve context from any MCP-compatible source. By adhering to these open standards, an agentic mesh stays flexible: new models, databases, or robotics platforms can be swapped in and out without breaking the overall system.
Riding The 7 Capabilities
Building on these principles, the mesh architecture relies on seven key capabilities that together scale, manage, and orchestrate our agents.
Each capability functions across the mesh (not tied to any one platform) and often has parallels in cloud or microservices environments (like service registries or audit logs). Belo,w we describe each capability and how it plays out in practice (notably in robotics and other agentic domains):
Agent & Workflow Discovery: I suppose that’s where Google comes in. The agentic mesh maintains a central directory of all available agents and agentic workflows. This lets any part of the organization discover and reuse existing agents. For example, a robotics fleet may publish “agent cards” or metadata about each robot’s capabilities (navigation, vision, grasping, etc.). Other agents (or humans) can query this catalog to find the right robot for a task. A2A explicitly supports this via JSON “Agent Cards” that advertise an agent’s skills. By enforcing a standardized taxonomy in the registry, organizations also apply policies (e.g., only certified agents may handle sensitive data). In short, discovery ensures that teams don’t reinvent the wheel – a specialized robot or software agent built for one project can be enlisted by others.
AI Asset Registry is a secure place to store and access prompts, model configurations, tool interfaces, and other essential artifacts. In practice, this means version-controlling all instructions and settings that agents use. For example, if a warehouse robot uses a system prompt to summarize its observations, that prompt lives in the registry and is managed (with audits and tests for bias or “jailbreaks”). Other critical assets include agent policies (“which tools can this agent call?”), LLM settings (model name, temperature, fine-tuning parameters), and “golden datasets”. The registry also enforces who can update each asset and tracks versions over time. This is incredibly important for ensuring that all use of the model is governed. By externalizing control of these assets, the organization creates “golden agents” that are battle-tested, well-understood, and compliant, while still allowing engineers to experiment in sandboxed environments before promoting changes.
Feedback Management: Continuous learning is baked into the mesh via automated feedback loops. I will soon do a paper review on Sakana AI’s Teacher/Student paper titles → “Reinforcement Learning Teachers of Test Time Scaling”.
source
The idea here is that every agentic workflow generates performance data (metrics, logs, or even human ratings) that feed back to improve the agents. For instance, imagine a robotic arm repeatedly failing to pick up a certain part. The system logs this failure, and the mesh uses that signal to refine the arm’s grasping agent or update its prompt. Techniques can include reinforcement-like pipelines: agents critique each other, humans label outputs, and token usage or delays are tracked.
source
Traditionally coming from the Actor/Critic framework in Reinforcement Learning, after each delivery by an autonomous vehicle, a “critic” agent rates the delivery speed and accuracy, automatically adjusting the navigation agent’s parameters for next time. These feedback loops treat each run as an opportunity to improve: statistics like success rates or user satisfaction are collected and used to propose new prompt variations or model fine-tunings.
Compliance & Risk Management: Oh, this is so important! Even in a decentralized mesh, every action must obey safety and policy rules. Compliance/risk tools are built into workflows as well. This can take the form of embedded compliance agents that audit other agents’ actions against regulations. For example, a “safety inspector” agent might intercept every high-speed motion command sent to a robot and verify it lies within permitted speed and area limits. More broadly, before completing a workflow, the mesh can enforce that certain checks are run (e.g., a data-privacy agent reviewed outputs, or a legal-review agent validated contracts). As noted by industry analysts, compliance in agentic systems is challenging: unconstrained autonomy breeds risk. Thus, meshes often constrain agentic workflows to be less “creative” and more rule-bound. For instance, an autonomous trading agent might be forced to include a regulatory-checking module in every transaction. By embedding policies and guardrails (fine-grained authZ, audit trails, “whitelist” of allowed operations), the mesh dramatically shrinks the scope of potential errors or abuse.
Evaluation Systems: Rigorously and frequently (re-) testing agents is also part of governance. The mesh includes testing and evaluation pipelines that assess entire agent workflows. Just as software is unit- and integration-tested, agentic workflows are validated on multiple levels. For example, every deployment or model upgrade might trigger a battery of automated simulations: edge cases, failure modes, and ethical constraints are tested. If a warehouse robot’s path-planning agent is updated, the evaluation system might run it through hundreds of simulated pickup scenarios, checking that it never collides with humans or stocked shelves. These evaluations collect metrics on correctness, bias, and performance drift. Non-deterministic LLM steps require special care: test suites can include “step-level” probes (e.g., is the right tool API called?) and “workflow-level” scenarios (e.g., does the output meet business requirements?). By coupling continuous integration with human-in-the-loop review and even adversarial testing (prompt injection, misuse cases), the mesh ensures that agent pipelines behave as expected over time.
Observability: The mesh provides end-to-end visibility into agentic workflows through unified logging and metrics. This means every workflow—whether it spans hundreds of GPT calls or controls physical robots—is traced and monitored. Key observability features include tracking the chain of events (who called which agent, with what inputs), performance metrics (latencies, resource usage), and detailed audit logs. In robotics, this is akin to centralizing telemetry from every sensor and control loop. For example, if a drone swarm is mapping a disaster area, the mesh records each agent’s path-planning steps and data queries, so engineers can reconstruct or debug the mission later. Emerging standards (like OpenTelemetry semantic conventions for agents) are already being incorporated to make this systematic. Observability also ties into cost and risk management: if a rogue agent starts consuming excessive compute or generates unsafe actions, the monitoring layer can flag it immediately. By designing in observability from the start, enterprises ensure that “agent sprawl” and opaque decision paths are avoided.
Authentication & Authorization: Finally, secure inter-agent communication is enforced by robust identity controls. Every agent-to-agent or agent-to-service call is authenticated (who is speaking?) and authorized (are they allowed this action?). For example, if a logistics bot agent needs to query an inventory database, it does so with a short-lived credential or JWT token that’s validated by the mesh’s auth service. Modern agent frameworks leverage OAuth 2.0, JWTs, and related standards to issue fine-grained access rights. In robotics, this could map to cryptographic handshakes between robot controllers and networked services: a maintenance agent must authenticate before issuing control commands to a robot. Crucially, constraints like “least privilege” are applied across the mesh. If one component is compromised, the damage is limited because it cannot arbitrarily escalate privileges. These time-tested security patterns from cloud systems are adapted to the stochastic, multi-agent world to ensure the blast radius of any failure is minimized.
Taken together, these capabilities make the loosely coupled agents into a coherent agentic AI mesh.
For instance, in an automated warehouse, a robotic choreography agent might discover via the mesh’s catalog.
How might such a flow work, then?
Assume we have two agents. A path-planning agent and an arm-control agent exist. All inter-agent calls happen over secure channels with authenticated tokens
The path-planning agent retrieves approved prompt templates and motion parameters from the asset registry.
The path planning agent then initiates running the task.
Throughout execution, every step is logged, continuously evaluated, and bounded by safety policies.
If the combined performance (e.g., throughput, accuracy) shows room for improvement, the feedback system likely updates the agents’ configurations.
Otherwise, the arm-control agent retrieves the confirmed “golden” planned path and executes it.
I believe this concept is quite easy to understand. Moreover, it’s clear how new robots or models can be plugged in at any time due to composability and open protocols. In essence, each agentic workflow becomes a well-supervised and traceable process.
What is the difference from traditional workflow management systems like N8N?
Why the additional complexity?
Implications for Multi-Agent AI Systems
The agentic mesh paradigm promises significant advances for large-scale AI deployments. By design, it supports scalability: components can be replicated across clusters or new regions with minimal re-engineering, and thousands of agents can coordinate through the standardized connectivity layer. It also enhances evolvability: since logic, data access, and interfaces are decoupled, new AI models or control algorithms can be swapped in as “infrastructure” over time, much like container images are updated. Indeed, one analysis envisions agentic frameworks eventually commoditizing into infrastructure layers (citing the OCI model for containers).
source
For robotics, I think, the implications are clear.
An agentic mesh can turn a collection of heterogeneous robots into an integrated ecosystem: the fleet’s management software is no longer a monolith, but a web of specialized agents (navigation, perception, dialogue, etc.) that cooperate. New robots or sensors can be added on the fly via the mesh (composability), and proprietary vendor software is abstracted behind open interfaces (vendor-neutrality).
In Closing
Firstly, McKinsey’s idea of the agentic mesh sounds fantastic, but is incredibly difficult, maybe impossible, to implement from an engineering standpoint. Sure, the agentic AI mesh envisions a future where multi-agent systems are composable, observable, and governed from the ground up. Where agents become plug-in building blocks rather than siloed experiments. This paradise of interoperability blends the flexibility of open ecosystems with enterprise-grade controls. Maybe that’s why it feels so unachievable.
“the connective and orchestration layer that enables large-scale, intelligent agent ecosystems to operate safely and efficiently, and continuously evolve”.
For advanced robotics and other agent-first domains, adopting such an architecture could be key to scaling up safely. It should also encourage a shift toward agent-native design. And that means not building for workflow but building for efficient judgment and routing. That means we have to rebuild interfaces, logic, and data around modular and non-deterministic AI actors.
The payoff needs to be a more robust, adaptable automation, but only if organizations embed the right standards and controls. Those who embrace the mesh’s principles and capabilities will be best positioned to harvest the full potential of autonomous AI, while keeping risk and technical debt under control. Embedded policies, compliance agents, and auditing mean autonomous actors operate within known bounds. Any drift or anomaly is observable and can be corrected before causing harm.
hi - source
The mesh’s principles and capabilities, as well as emerging protocols like MCP and A2A, exemplify an open and modular approach to agent workflows where each capability and principle above is drawn from cutting-edge industry discussions.
Most importantly, the mesh’s governance features make such systems safer and more reliable than a wild proliferation of stand-alone agents.
Thank you for reading this far.