Karpathy's LLM OS (2023): the architecture blueprint we still use today

Read the field note below to see how we apply this pattern in real Claude Code projects.

verified 2 months ago7 min

The Karpathy Playbook #2: LLM OS

Part 2 of 6. From Software 2.0 to the March of Nines, tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.

The tweets

"With many pieces dropping recently, a more complete picture is emerging of LLMs not as a chatbot, but the kernel process of a new Operating System. E.g. today it orchestrates: Input & Output across modalities (text, audio, vision), Code interpreter, ability to write & run..."

@karpathy, 28 September 2023

Six weeks later, he made it concrete:

"LLM OS. Bear with me I'm still cooking. Specs: — LLM: OpenAI GPT-4 Turbo 256 core (batch size) processor @ 20Hz (tok/s) — RAM: 128Ktok — Filesystem: Ada002"

@karpathy, 10 November 2023

And later, at a Sequoia Capital event after leaving OpenAI: "Everyone is trying to build what I like to refer to as a LLM OS."

What we heard

If Software 2.0 (entry #1 in this series) redefined how software is written, LLM OS redefined what it looks like when it runs. The analogy is the most useful mental model we've ever found for explaining multi-agent systems to people who have never built one:

OS concept	LLM equivalent
CPU	The LLM itself
RAM	Context window
Filesystem	Vector store / long-term memory
Processes	Agents
IPC	Tool calls and agent handoffs
Kernel	The orchestration layer (LangGraph)
User space	The application (your product)

Every architecture decision becomes obvious once the analogy clicks. You don't give every process unlimited RAM. You page memory in and out. You don't give every process root access. You scope permissions. You instrument the kernel. Every syscall is logged.

These aren't AI-specific insights. They're OS design principles from the 1970s. We just forgot to apply them to agents.

What we actually do with this

We use this framing explicitly in architecture reviews with clients. When a team shows us a prototype where every agent has full conversation history (unlimited RAM), shares a single tool registry (no permission scoping), and emits no traces (no kernel instrumentation), we explain it in OS terms:

"You've written a single-process program with global state and no logging. It works until it doesn't, and when it breaks you can't debug it."

Engineers who have never trained a model understand memory paging, process isolation, and system-call tracing. The LLM OS framing gives them vocabulary to reason about agent architecture without any ML background. That vocabulary is the actual point of the analogy.

Applied: what we built on InterviewLM

Using the LLM OS framing:

CPU routing: Claude Opus for complex reasoning (agent coordination, evaluation), Claude Haiku for simple tasks (formatting, extraction). Dynamic routing based on task type. Model switching mid-session.
RAM management: Each interviewer persona agent receives only the context relevant to its current turn. Full transcript lives in long-term store. Prompt caching keeps the "kernel" (system prompt) in cache. 90%+ cache hit rate, sub-50ms for cached tokens.
Filesystem: LangGraph checkpointing stores state at every node. Any failed session can be replayed from the last successful checkpoint.
Process isolation: Each candidate session is an isolated LangGraph run. A failure in one session cannot affect another.
Kernel logging: Every agent step emits a LangSmith trace. We can replay any production session exactly.

Result: 8 agents, 100+ concurrent sessions, 40% cost reduction vs. the prior architecture, sub-2s p99 latency.

The one thing to steal from this

Before you build your next agent system, write the OS spec. Fill in the table above for your architecture. If you can't fill every row, you're not ready to start. That table is the minimum viable architecture document.

Next in this series

#3. Eureka Labs (July 2024). Karpathy leaves OpenAI and makes a specific bet: AI doesn't just change software, it changes how humans learn. What that bet tells us about his model of where this all goes.

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.

How much time should I budget?

Typical effort is 7 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Get a free 2-page audit Book a 30-min discovery call

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem

claude-code·no artifact

Use auto mode, not --dangerously-skip-permissions

Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…

@frenxt · 8 mininstall →

claude-code·skill

Publish your stack to Cables (automated)

A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…

@frenxt · 10 mininstall →

claude-code·skill

Replicate Ragav's stack (skills + plugins + scripts)

Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…

@frenxt · 15 mininstall →

Share this cable

Share on Twitter Share on LinkedIn