AI Agents and the Next Infrastructure Layer

8 January 2025

The transition from AI as a feature to AI as an agent represents one of the more significant architectural shifts in software in the past decade. It is not merely a matter of giving a language model access to tools and a longer context window. Agentic systems introduce fundamentally new requirements around state management, task decomposition, error recovery, observability, and trust — requirements that the current infrastructure stack was not designed to meet.

Consider what happens when a language model is asked to complete a multi-step task with real-world consequences: booking travel, executing trades, provisioning cloud infrastructure, or managing customer communications. Each of these involves a sequence of decisions where errors compound, where reversibility varies, where external systems have their own state and rate limits, and where the cost of a hallucination is not a wrong answer in a chat window but a concrete, sometimes irreversible, action in the world. This is a categorically different problem from next-token prediction, and it requires infrastructure that reflects that difference.

The infrastructure companies we are most interested in operate at several layers. At the lowest level, there is the question of reliable execution: how do you run agentic workloads in a way that is reproducible, interruptible, and resumable? Current cloud primitives — functions, containers, queues — can be composed to approximate this, but doing so requires significant engineering effort and results in brittle architectures that are hard to debug and expensive to maintain. The companies building durable execution primitives specifically designed for long-running, stateful AI workloads are addressing a real and underserved gap.

One level up is the question of observability. Debugging a deterministic program is hard. Debugging an agentic system that made a series of plausible-looking decisions, each of which seemed reasonable in isolation, that together led to an incorrect or harmful outcome, is far harder. The observability tools that work well for traditional distributed systems — traces, logs, metrics — capture what happened, but they don't help you understand why the agent made the choices it did or how to prevent a recurrence. This is an open problem, and the teams working on it are doing genuinely novel work at the intersection of interpretability research and production engineering.

At the application layer, we are watching the emergence of what we think of as agent operating systems: platforms that manage multiple agents, coordinate their activities, enforce policy constraints, handle resource allocation, and provide a governance layer for enterprises that need to deploy AI capabilities at scale without losing control of what those agents are doing. This category is nascent, and the design space is wide open. The risk of building here is that the major cloud providers and LLM developers may absorb this layer over time. The opportunity is that enterprises have a strong preference for multi-provider architectures and are deeply skeptical of vendor lock-in in a category where the leading models are still changing rapidly.

The deeper trend underlying all of this is a shift in what software does. For most of computing history, software executed instructions. The programmer specified the behavior, the machine carried it out. AI agents introduce a different model: the operator specifies an objective, and the system determines how to pursue it. This is more powerful and more efficient, but it also means that software now makes consequential decisions, and the infrastructure layer needs to reflect that. The companies that build the right primitives here will become foundational to the next decade of enterprise software. We are actively looking for them.

All posts Get in touch