How to Run Untrusted AI Agent Code Without Docker

Docker shares the host kernel. That was always the trade. It was fine when a human read the script before it ran. It stopped being fine the second an LLM started writing code at runtime off a prompt nobody pre-screened. So here's the practitioner version: what to actually run when your agent executes code you've never seen.

The thing that broke

The review step is gone. A model writes a script, the script lives for milliseconds, then it executes. Could be a clean chart. Could be a curl-pipe-shell because a prompt injection rewired intent four hops upstream. You don't get to read it first.

And the container under it shares one kernel with every other workload on the box. CVE-2024-1086, a netfilter use-after-free, owns every container on the host once it pops. CISA confirmed active ransomware exploitation in late 2025, years after the patch. November 2025 dropped three more under runC (CVE-2025-31133, CVE-2025-52565, CVE-2025-52881), all bypassing maskedPaths through symlink races to write procfs gadgets. Own core_pattern and the kernel runs your binary on the next coredump, as root.

In March 2026, Oxford and the UK AISI shipped SandboxEscapeBench. Frontier models reliably escaped privileged containers, writable host mounts, and exposed Docker daemons on their own. Cost per attempt: roughly a dollar. The model does the recon, picks the CVE, hands back the shell. So the fix isn't a better Docker config. It's a different boundary.

Layer 1: move hostile workloads off the shared kernel

If the code came from an untrusted prompt, it doesn't belong on a shared kernel. You want a hardware boundary.

Firecracker is what AWS runs Lambda and Fargate on. Each workload gets its own dedicated kernel in a microVM, boots in ~125ms, tiny hypervisor surface. Every kernel CVE that owns Docker stops dead at the hypervisor.

# easiest on-ramp: managed firecracker sandboxes
# E2B and Together Code Sandbox both run firecracker under the hood
pip install e2b-code-interpreter
# or stand up firecracker-containerd yourself if you want the metal

For the jailer config: seccomp on, drop all capabilities, run as a dedicated non-root jailer user, pin CPUs so a noisy neighbor doesn't melt throughput.

Kata Containers when you need OCI image compatibility. Wraps standard images in a per-workload microVM. Pair with QEMU or Cloud Hypervisor.

# pod spec
spec:
  runtimeClassName: kata-qemu   # per-workload microVM
  # never set hostNetwork: true
  # disable hostPath volumes

gVisor when the workload is compute-heavy and the input is trusted-ish. Modal runs it in prod for serverless GPU agents. The Sentry intercepts syscalls in userspace. It won't survive every kernel-tier exploit, but it kills the easy ones.

# run with the runsc runtime, kvm platform for speed
docker run --runtime=runsc --platform=kvm your-image

Layer 2: default-deny egress

Isolation handles the local box. Egress handles exfil. Half the production sandboxes I audit ship allow-all outbound, which means a compromised agent phones home to C2 or smuggles tokens out a Markdown image tag and nobody notices.

Block everything by default. Allowlist only the endpoints the agent actually needs (the model API, the tool API). On Kata, attach the network namespace to a Cilium L7 policy that denies everything except those hosts. Tunneling, exfil, and callbacks all die at the wall when there is one.

Layer 3: patch runC and kill host root

Hardware isolation is the floor, not an excuse to run stale runC underneath it.

# fixed: 1.2.8, 1.3.3, or 1.4.0-rc.3
runc --version
# then enable user namespaces and DON'T map host root into the namespace

Most procfs gadget writes need root on the host. User namespaces take that away. The 1.1.x line is end of life and unpatched against the November CVEs, so if you're there, you're exposed.

Layer 4: detection as the canary

Isolation fails silently. Detection tells you when. Deploy Falco or Sysdig Secure with a rule for procfs symlink creation (the runC escape signature), plus rules for agent-typical anomalies: outbound TCP to non-allowlisted hosts, writes to /etc/, processes spawning nc or socat.

- rule: Create Symlink Over Procfs Files
  desc: runC container escape via procfs symlink (CVE-2025-31133 / 52565)
  condition: create_symlink and evt.arg.target in ("/proc/sysrq-trigger","/proc/sys/kernel/core_pattern")
  priority: CRITICAL

Pipe critical alerts to a channel a human reads at 3am.

Gotchas

A perfect microVM doesn't save you from a poisoned weight file. Pickle files and backdoored safetensors execute inside whatever the container allowed. Audit your model supply chain as a separate layer.
One shared API key across every workload is one compromise away from burning all of them. Scope keys per workload.
Default creds are the first pivot. SandboxEscapeBench models jumped to the host through default Vagrant SSH creds the designers didn't plan for. Vagrant, Postgres, Redis, admin tokens, kill them in bootstrap.

Wrapping up

Docker default is not a sandbox for model-generated code from untrusted prompts. Firecracker or Kata for hostile input, gVisor for trusted-ish compute, default-deny egress on all of it, patched runC with user namespaces underneath, Falco watching. Ship that today and you've moved the boundary from "shared kernel" to "hardware."

I wrote the full breakdown including the autonomous ROME breakout and the system-prompt contract that hardens agents against instrumental convergence over on the ToxSec Substack.

ToxSec covers AI security vulnerabilities, attack chains, and the offensive tools defenders actually need to understand. Run by an AI Security Engineer with hands-on experience at the NSA, Amazon, and across the defense contracting sector. CISSP certified, M.S. in Cybersecurity Engineering.