Karpathy's Software 3.0 (June 2025) — "the hottest new programming language is English"
Read the field note below to see how we apply this pattern in practice.
Turn this cable into a shipping system.
We help teams deploy reliable AI workflows with architecture, implementation, and hardening support.
The Karpathy Playbook #5: Software 3.0
Part 5 of 6 — tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.
The talk
"The hottest new programming language is English."
"LLMs are a new kind of computer, and you program them in English. Hence I think they are well deserving of a major version upgrade in terms of the way we think about software."
— Andrej Karpathy, Software Is Changing (Again), YC AI Startup School keynote, 18 June 2025 (tweet)
The formal framing: Software 1.0 was code written by humans. Software 2.0 was neural-network weights shaped by datasets (the 2017 post, entry #1 in this series). Software 3.0 is natural-language prompts that program LLMs at runtime.
What we heard
Software 3.0 is the concept that reframes everything. If a prompt is a program, then prompt engineering is not a soft skill — it is programming, with all the disciplines that implies: versioning, testing, reviewing, debugging, regression suites.
Most teams still treat prompts the way junior developers treat SQL strings in 2005 — pasted into code, tweaked by vibes, never reviewed, never tested, lost to git blame the moment someone refactors a module. Every problem that plagued pre-ORM SQL plagues modern prompts, for exactly the same reason: the thing being passed is being treated as a string, not as code.
The Software 3.0 frame makes the fix obvious. Prompts deserve everything source code deserves.
What we actually do with this
We treat prompts as first-class source artifacts in every engagement. The checklist:
| Software 1.0 discipline | Applied to Software 3.0 |
|---|---|
| Version control | Prompts live in /prompts, versioned in Git |
| Code review | Every prompt change goes through PR review |
| Unit tests | Golden-set evals run on every prompt change |
| Regression suites | Production failure cases become fixed eval rows |
| Logging | Every prompt execution emits a trace (LangSmith) |
| Modularity | Reusable prompt fragments imported like modules |
| Cost profiling | Token count is measured the same way binary size is |
None of this is optional. A prompt change with no eval is a binary change with no test.
Applied: InterviewLM's prompt pipeline
We built InterviewLM's prompt layer as source code from day one:
- Structure:
/prompts/personas/*.md,/prompts/eval/*.md,/prompts/system.md. Every prompt is a file. - Imports: A lightweight templating layer concatenates fragments at runtime — the rubric prompt imports from a shared glossary file, the interviewer prompt imports from a behavioural-guidelines file. A change to the glossary file propagates to every persona.
- Evals: 40 golden interviews run on every prompt-layer PR. If eval scores drop, the PR does not merge.
- Prompt caching as linker optimization: Static fragments (rubric, glossary) go first in the prompt; dynamic fragments (candidate input) go last. Cache hit rate: 90%+. This is not clever — this is just understanding that the prompt is code and ordering statements for compilation.
The one thing to steal from this
Move your prompts out of your application code today. Create a /prompts directory. Put one prompt per file. Version them in Git. That single move — treating prompts as files instead of inline strings — changes every downstream discipline, because now git diff and git blame and code review and CI all work on them by default.
Next in this series
#6 — The march of nines (October 2025). Karpathy on Dwarkesh: AGI is a decade away, agents are a decade away, and 90%→99.999% reliability is the real production problem. The sobering finale to this arc.
Quick answers
What do I get from this cable?
You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.
How much time should I budget?
Typical effort is 7 min. The cable is marked intermediate.
How do I install the artifact?
This cable is guidance-only and does not ship an installable artifact.
How fresh is the guidance?
The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.
More from @frenxt
Anthropic's Responsible Scaling Policy (Sep 2023) — safety as operating procedure
*A five-part series tracing Anthropic's public thinking through Dario Amodei's writing and the company's model spec — one foundational document per entry, each with FRE|Nxt Labs l…
Anthropic's "brilliant friend" spec — the product voice that defines Claude
*Part 2 of 5 — tracing Anthropic's public thinking with FRE|Nxt Labs production commentary.*
Dario Amodei's Machines of Loving Grace (Oct 2024) — planning against the upside case
*Part 3 of 5 — tracing Anthropic's public thinking with FRE|Nxt Labs production commentary.*