Karpathy's LLM OS (2023) — the architecture blueprint we still use today
Read the field note below to see how we apply this pattern in practice.
Turn this cable into a shipping system.
We help teams deploy reliable AI workflows with architecture, implementation, and hardening support.
The Karpathy Playbook #2: LLM OS
Part 2 of 6 — from Software 2.0 to the March of Nines, tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.
The tweets
"With many pieces dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates: Input & Output across modalities (text, audio, vision), Code interpreter, ability to write & run..."
— @karpathy, 28 September 2023
Six weeks later, he made it concrete:
"LLM OS. Bear with me I'm still cooking. Specs: — LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s) — RAM: 128Ktok — Filesystem: Ada002"
— @karpathy, 10 November 2023
And later, at a Sequoia Capital event after leaving OpenAI: "Everyone is trying to build what I like to refer to as a LLM OS."
What we heard
If Software 2.0 (entry #1 in this series) redefined how software is written, LLM OS redefined what it looks like when it runs. The analogy is the most useful mental model we've ever found for explaining multi-agent systems to people who have never built one:
| OS concept | LLM equivalent | |---|---| | CPU | The LLM itself | | RAM | Context window | | Filesystem | Vector store / long-term memory | | Processes | Agents | | IPC | Tool calls and agent handoffs | | Kernel | The orchestration layer (LangGraph) | | User space | The application (your product) |
Every architecture decision becomes obvious once the analogy clicks. You don't give every process unlimited RAM — you page memory in and out. You don't give every process root access — you scope permissions. You instrument the kernel — every syscall is logged.
These aren't AI-specific insights. They're OS design principles from the 1970s. We just forgot to apply them to agents.
What we actually do with this
We use this framing explicitly in architecture reviews with clients. When a team shows us a prototype where every agent has full conversation history (unlimited RAM), shares a single tool registry (no permission scoping), and emits no traces (no kernel instrumentation), we explain it in OS terms:
"You've written a single-process program with global state and no logging. It works until it doesn't, and when it breaks you can't debug it."
Engineers who have never trained a model understand memory paging, process isolation, and system-call tracing. The LLM OS framing gives them vocabulary to reason about agent architecture without any ML background. That vocabulary is the actual point of the analogy.
Applied: what we built on InterviewLM
Using the LLM OS framing:
- CPU routing: Claude Opus for complex reasoning (agent coordination, evaluation), Claude Haiku for simple tasks (formatting, extraction). Dynamic routing based on task type — model switching mid-session.
- RAM management: Each interviewer persona agent receives only the context relevant to its current turn. Full transcript lives in long-term store. Prompt caching keeps the "kernel" (system prompt) in cache — 90%+ cache hit rate, sub-50ms for cached tokens.
- Filesystem: LangGraph checkpointing stores state at every node. Any failed session can be replayed from the last successful checkpoint.
- Process isolation: Each candidate session is an isolated LangGraph run. A failure in one session cannot affect another.
- Kernel logging: Every agent step emits a LangSmith trace. We can replay any production session exactly.
Result: 8 agents, 100+ concurrent sessions, 40% cost reduction vs. the prior architecture, sub-2s p99 latency.
The one thing to steal from this
Before you build your next agent system, write the OS spec. Fill in the table above for your architecture. If you can't fill every row, you're not ready to start. That table is the minimum viable architecture document.
Next in this series
#3 — Eureka Labs (July 2024). Karpathy leaves OpenAI and makes a specific bet: AI doesn't just change software, it changes how humans learn. What that bet tells us about his model of where this all goes.
Quick answers
What do I get from this cable?
You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.
How much time should I budget?
Typical effort is 7 min. The cable is marked intermediate.
How do I install the artifact?
This cable is guidance-only and does not ship an installable artifact.
How fresh is the guidance?
The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.
More from @frenxt
Anthropic's Responsible Scaling Policy (Sep 2023) — safety as operating procedure
*A five-part series tracing Anthropic's public thinking through Dario Amodei's writing and the company's model spec — one foundational document per entry, each with FRE|Nxt Labs l…
Anthropic's "brilliant friend" spec — the product voice that defines Claude
*Part 2 of 5 — tracing Anthropic's public thinking with FRE|Nxt Labs production commentary.*
Dario Amodei's Machines of Loving Grace (Oct 2024) — planning against the upside case
*Part 3 of 5 — tracing Anthropic's public thinking with FRE|Nxt Labs production commentary.*