Karpathy's LLM OS (2023) — the architecture blueprint we still use today

Read the field note below to see how we apply this pattern in practice.

verified today
Security: unaudited
SERIES The Karpathy Playbook 02/06DIFFICULTY intermediateTIME 7 minCATEGORY ai-industryVERIFIED PUBLISHER FRE|Nxt LabsEdit on GitHub →
Need this in production?

Turn this cable into a shipping system.

We help teams deploy reliable AI workflows with architecture, implementation, and hardening support.

The Karpathy Playbook #2: LLM OS

Part 2 of 6 — from Software 2.0 to the March of Nines, tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.


The tweets

"With many pieces dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates: Input & Output across modalities (text, audio, vision), Code interpreter, ability to write & run..."

@karpathy, 28 September 2023

Six weeks later, he made it concrete:

"LLM OS. Bear with me I'm still cooking. Specs: — LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s) — RAM: 128Ktok — Filesystem: Ada002"

@karpathy, 10 November 2023

And later, at a Sequoia Capital event after leaving OpenAI: "Everyone is trying to build what I like to refer to as a LLM OS."


What we heard

If Software 2.0 (entry #1 in this series) redefined how software is written, LLM OS redefined what it looks like when it runs. The analogy is the most useful mental model we've ever found for explaining multi-agent systems to people who have never built one:

| OS concept | LLM equivalent | |---|---| | CPU | The LLM itself | | RAM | Context window | | Filesystem | Vector store / long-term memory | | Processes | Agents | | IPC | Tool calls and agent handoffs | | Kernel | The orchestration layer (LangGraph) | | User space | The application (your product) |

Every architecture decision becomes obvious once the analogy clicks. You don't give every process unlimited RAM — you page memory in and out. You don't give every process root access — you scope permissions. You instrument the kernel — every syscall is logged.

These aren't AI-specific insights. They're OS design principles from the 1970s. We just forgot to apply them to agents.


What we actually do with this

We use this framing explicitly in architecture reviews with clients. When a team shows us a prototype where every agent has full conversation history (unlimited RAM), shares a single tool registry (no permission scoping), and emits no traces (no kernel instrumentation), we explain it in OS terms:

"You've written a single-process program with global state and no logging. It works until it doesn't, and when it breaks you can't debug it."

Engineers who have never trained a model understand memory paging, process isolation, and system-call tracing. The LLM OS framing gives them vocabulary to reason about agent architecture without any ML background. That vocabulary is the actual point of the analogy.


Applied: what we built on InterviewLM

Using the LLM OS framing:

  • CPU routing: Claude Opus for complex reasoning (agent coordination, evaluation), Claude Haiku for simple tasks (formatting, extraction). Dynamic routing based on task type — model switching mid-session.
  • RAM management: Each interviewer persona agent receives only the context relevant to its current turn. Full transcript lives in long-term store. Prompt caching keeps the "kernel" (system prompt) in cache — 90%+ cache hit rate, sub-50ms for cached tokens.
  • Filesystem: LangGraph checkpointing stores state at every node. Any failed session can be replayed from the last successful checkpoint.
  • Process isolation: Each candidate session is an isolated LangGraph run. A failure in one session cannot affect another.
  • Kernel logging: Every agent step emits a LangSmith trace. We can replay any production session exactly.

Result: 8 agents, 100+ concurrent sessions, 40% cost reduction vs. the prior architecture, sub-2s p99 latency.


The one thing to steal from this

Before you build your next agent system, write the OS spec. Fill in the table above for your architecture. If you can't fill every row, you're not ready to start. That table is the minimum viable architecture document.


Next in this series

#3 — Eureka Labs (July 2024). Karpathy leaves OpenAI and makes a specific bet: AI doesn't just change software, it changes how humans learn. What that bet tells us about his model of where this all goes.

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.

How much time should I budget?

Typical effort is 7 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

More from @frenxt

Share this cable