Can an Agent Build an Agent?
Exploring Autonomy, Program Synthesis, and the Role of Code as Interfaces
Last week, Amazon Web Services joined the “CodingAgent” game by launching Kiro, another VSCode clone, err, agentic IDE designed to transition developers from "vibe coding" to production-grade software by generating specs, design documents, task plans, and auto-triggering tests and docs updates. Among the features, Kiro also supports Anthropic’s Model Context Protocol, intelligent hooks, and agent-driven workflows. While Kiro is free during its preview phase, I’d expect Amazon will adopt tiered pricing plans later in 2025.
But why is code generation so important?
Gee, maybe my age is showing. But let’s start with a video. Besides the fact that the video is really well made and also covers games from my (very early) youth, it lays out a simple and elegant solution to a scaling problem.
Now, as we are building multi-agent architectures to solve daily work problems, we are also faced with a multitude of scaling issues that can be bridged by auto-generated code. Similarly, the 3-D images were developed while PCs were still extremely limited in their capabilities and availability.
Code as Interfaces: The Unifying Abstraction
Given how many agents I have built over the last years, I have learned that at the heart of modern software design is the principle of code as interfaces. This means that code, rather than simply being a sequence of instructions for the machine, also acts as a contract, a specification, or a protocol between different software components, teams, or even artificial agents.
It describes the way the different components exchange information (information is data in context).
“This brings back memories especially on how we had to cram everything on a disc since each disc was profit. Also we had limited ram. Not even 640k.” source
In object-oriented programming, an interface defines what operations must be available, not how they are implemented. By programming to such interfaces, developers must ensure that components can be replaced, extended, or composed without breaking the broader system. Historically, that was a huge step forward for code quality and, more importantly, reusability.
In modern times, however, this idea extends well beyond objects in code; an interface can now be any agreed-upon way to expose functionality or behavior. APIs, REST endpoints, MCP tools, and even simple function signatures all act as interfaces between modules. Whether those modules are written by humans, generated by AI, or instantiated at runtime by agents doesn’t really matter. With Kiro (and Claude Code, Cursor, Windsurf, and Lovable), we are now evolving into a world where also cognitive autonomous software programs (“agents”) can generate, execute, and refine code.
In that world, code itself becomes the primary medium through which agents communicate, delegate, and collaborate!
Based on what we can see already, the advantages are substantial:
flexibility (different implementations can be swapped in, especially when considering tools),
maintainability (internal changes don’t require callers to adapt),
reusability (common contracts can be used across projects), and
scalability (systems can grow as new implementations or agents are added).
If we look at traditional IDEs, for example, Eclipse is built on interface-based programming, allowing thousands of third-party plugins to interoperate seamlessly with the core platform, simply by adhering to specified APIs. But that is us humans adhering to pre-defined and pre-communicated standards.
However, AI agents tasked with building agents need only agree on the interface; the underlying implementation can vary, evolve, or be optimized independently.
And this is where it gets interesting.
Programmatically Autonomous Agents
Already now we have CodingAgent that can interpret prompts, decompose tasks, generate code, execute and debug it, and iterate based on feedback that satisfies the agreed-upon interface.
And that fact alone has some fascinating key architectural patterns emerging:
Decomposition and Delegation: When a user or another agent issues a prompt, an agent breaks it down into subtasks, each handled by a specialized sub-agent. These subtasks are specified by interfaces, i.e., “what” needs to be done, while the “how” is left to the sub-agent’s internals.
Dynamic Composition: Agents generate code that then instantiates and connects other agents or modules, using interfaces to ensure compatibility. For instance, an agent might generate a Matplotlib visualization agent, a data-wrangling agent, and a retro-styling agent, all communicating via well-defined, dynamically created method signatures or APIs.
Iterative Refinement: Generated code is continually tested, executed, and revised. Errors or benchmark failures (e.g., from GAIA - Paper) trigger new rounds of code synthesis, with each cycle constrained by the interface contract.
Benchmarked Evolution: The quality of agent-generated agents is measured against real-world, multi-step benchmarks like GAIA, ensuring that interfaces aren’t just syntactically correct but semantically robust and fit for purpose.
And that’s what makes this approach unique.
Interface-Based Programming in Action
In this paradigm, the “glue” between components is a set of abstract contracts, sometimes even dynamic artefacts, and not concrete implementations. This decoupling is what allowed ecosystems like VSCode or Eclipse to thrive: as long as plugins implement the required interfaces, they can be developed independently and plugged in at runtime. But it also requires stability and reliability. And that’s where I see the biggest risk at this point in time.
That said, this technology also underpins technologies like the Component Object Model, where clients and objects interact only through interface references, enabling versioning and evolution.
In agentic AI, this pattern becomes not only recursive, it becomes cyclic:
An agent, tasked with building another agent, defines interfaces for the new agent’s responsibilities (data input, processing, visualization, output, etc.).
The child agent then generates code that satisfies those contracts.
The parent agent doesn’t need to know how the child agent works internally, only that it adheres to the agreed interface.
In theory, this makes the system modular, extensible, and hopefully resilient to change.
How would that work in practice?
Case Study: A Web-Ready, Stylized Agent Factory
Imagine a workflow where:
User Prompt: “Build an agent that takes a dataset and produces a Classic Sierra-themed SVG visualization.”
Parent Agent Action:
Defines interfaces for data ingestion, cleaning, plotting (e.g., Matplotlib), SVG export, and retro styling.
Generates code for a new agent that implements these interfaces, potentially delegating subtasks to further sub-agents.
Tests the new agent’s outputs on benchmark tasks (again GAIA), checking for correctness and style fidelity.
Deployment: The synthesized agent is instantiated and used. If requirements change, only the affected components need updating. As long as the interfaces remain stable, the rest of the system remains untouched.
If we implement the agent architecture in this way, we can harvest several benefits and derive real-world implications:
Flexible Ecosystems: Agents can be added, removed, or upgraded without disrupting the whole, as long as they honor the same interfaces.
Collaborative AI: Different agents (or teams) can specialize in different domains, interoperating via code-as-interface contracts.
Continuous Improvement: Agents can refine their own implementations, or even their own architecture, as long as external contracts are preserved.
Creative Stylization: Thematic requirements (e.g., Classic Sierra visuals) can be treated as just another interface, allowing for creative and domain-specific customization without breaking core logic.
But then, this already opens my criticism as there are certain limitations.
Traceability: As agents generate agents, debugging and accountability become more complex. I’d even argue that controlling such complex dynamic systems will be extremely hard. And we don’t even have tools for that yet.
Safety: Unconstrained agent generation requires safeguards to ensure interfaces are unambiguous and behaviors are predictable. That makes them less powerful……maybe that’s actually a good thing.
Generalization: While current systems excel at narrow vertical domains, fully general agent-building, where any interface can be synthesized and any implementation provided, remains a research frontier.
In Closing
So ….Can an agent build an agent? With modern interface-based programming principles, the answer is a humble yes-ish?! I believe that code is the lingua franca that makes complex agentic architectures possible. By rigorously defining interfaces, agents can generate, compose, and refine other agents, creating modular, extensible, and creative AI ecosystems. Benchmarking frameworks like GAIA, coupled with code-first visualization tools (Matplotlib, SVG) and thematic layering, allow not only for functional correctness but for style, flair, and human engagement.
The future of autonomous software will be defined not just by what agents can do, but by how elegantly they can agree, through code as interfaces, on what needs to be done, leaving the how to the creative and adaptive power of the machines themselves
I think the design principle of using code as interfaces is an intriguing concept which I will for sure explore further. In general, ideas like “vibe coding” shouldn’t be anywhere near a professional code base, yet. But for quick prototyping or quick and dirty greenfield implementation, it might have its value.