Skip to main content

Command Palette

Search for a command to run...

[Threat Model] Why We Give AI Agents sudo in a MicroVM, Not a Container

Updated
6 min read
[Threat Model] Why We Give AI Agents sudo in a MicroVM, Not a Container
R
Building Rapid Claw — a managed AI agent deployment platform on OpenClaw. We help startups deploy AI Co-Founder agents that handle DevOps, support, and content in minutes. https://rapidclaw.dev

The scariest two words in our product copy are "sudo access."

An AI agent that can install packages, spin up a Postgres instance, and run whatever code it just wrote is the entire point of a builder sandbox. It is also, if you get the isolation story wrong, a security incident with a cron schedule. This post is about why we put every agent in its own MicroVM instead of a container, and what that actually buys you at the threat-model level.

The threat model is weirder than "malicious user"

Classic multi-tenant isolation assumes the tenant might attack you. Agent infrastructure has a stranger problem: the tenant is trusted, but the tenant's workload executes instructions sourced from email, web pages, and documents. Prompt injection means any text your agent reads is potential attacker input — and the agent has a shell.

So the realistic failure mode isn't "customer tries to escape the sandbox." It's "customer's agent reads a hostile email and becomes an attacker inside the sandbox, with root, at 3am." Your isolation boundary has to hold against code written and executed by a competent adversary who is already inside the box. That assumption changes the design.

Why a container isn't that boundary

Containers are a kernel feature, not a security boundary. Namespaces, cgroups, and seccomp filters all run on the shared host kernel, which means every container on the box shares one attack surface: a few hundred reachable syscalls, /proc and /sys quirks, and whatever kernel CVE shipped this quarter. A container escape is "just" a local privilege escalation away from every other tenant on the host.

You can harden this — drop capabilities, tighten seccomp profiles, gVisor-style syscall interception — but now you're trading compatibility for safety. Agents are the worst case for that trade: they legitimately want to apt install, bind ports, run compilers, and occasionally do something deeply weird with ptrace because the model read a Stack Overflow answer from 2014. Every syscall you block is a workflow you break; every syscall you allow is attack surface you keep.

What a MicroVM changes

A MicroVM (Firecracker-class VMM on KVM) moves the boundary from "kernel feature" to "hardware virtualization." Each agent gets three things.

Its own guest kernel. A kernel exploit inside the VM gets the attacker... the VM they were already in. Host compromise now requires a KVM or VMM escape — a dramatically rarer and more expensive bug class.

A tiny device model. Firecracker exposes roughly five virtio devices — net, block, vsock, balloon, entropy — instead of QEMU's sprawling emulated hardware. Less emulated surface, fewer places for an escape to hide.

A jailed VMM. The VMM process itself runs chrooted, cgrouped, and seccomp-restricted. If someone does pop the VMM, they land in a process that can see almost nothing. Defense in depth, but boring on purpose.

The cost used to be the reason everyone settled for containers: VMs were slow and heavy. That's gone. MicroVMs cold-boot in roughly 125–150ms with a few MiB of VMM overhead, and snapshot/restore means a paused agent resumes in well under a second. The performance argument for sharing a kernel is mostly dead.

What "sudo" means when the blast radius is the guest

Inside the guest, the agent is root and we don't fight it. It can install packages, edit systemd units, run Docker-in-VM, write garbage into /etc — all fine, because the host treats the entire VM as untrusted. Root inside the guest is a feature; the boundary is underneath it.

Two things make this survivable in practice.

Egress is the real perimeter. Isolation stops lateral movement; it does not stop exfiltration over channels you allowed. Each VM gets its own tap device, no route to its neighbors, and an egress policy in front of it. A prompt-injected agent that wants to mail your API keys somewhere still has to get past the network layer, which doesn't care how persuasive the email was.

Rollback turns disasters into shrugs. Because the whole machine is a snapshot artifact, rm -rf / or a hosed Python environment is a restore, not an incident. I wrote about what agent state looks like after 14 days in a MicroVM previously — the short version is that cheap, whole-machine snapshots change agent ops from "be careful" to "be reversible."

Honest numbers from a small fleet

For scale calibration: I'm a solo founder and the fleets I run are small — a handful of agents per host, not hundreds. That changes the optimization target. Big platforms tune MicroVMs for packing density; I tune for blast-radius-per-agent and restore time, because the customer promise is "your agent can't hurt your neighbor's, and Tuesday's mistake is reversible." Memory ballooning reclaims pages from idle agents overnight, and snapshots make an idle agent cost almost nothing — density follows from that without being the goal.

This is also, frankly, the part an operator should never have to see. The people running agents on Rapid Claw are founders and operators who want the AI to triage email and handle research — the MicroVM machinery exists so the Builder Sandbox tier can hand an agent real root without anyone having to learn what a jailer process is. If you're weighing whether to run this stack yourself, the self-hosting vs managed guide walks through the honest trade-offs.

Takeaways

1. Design isolation for prompt injection, not just malicious tenants. Assume the attacker is already executing code inside the sandbox.

2. A shared kernel is shared fate. If agents get shells, the boundary should be hardware virtualization, not namespaces.

3. Give the agent root inside a boundary you trust instead of a nerfed environment inside one you don't. Compatibility and safety stop fighting.

4. Egress policy and snapshots do the work isolation can't: containment of allowed channels, and reversibility of everything else.

The uncomfortable truth about agent infrastructure is that the model will eventually do something dumb with root. The job is to build a world where that's a log line, not a postmortem.