Karpathy's Software 3.0 (June 2025): "the hottest new programming language is English"

Read the field note below to see how we apply this pattern in real Claude Code projects.

verified 2 months ago7 min

The Karpathy Playbook #5: Software 3.0

Part 5 of 6. Tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.

The talk

"The hottest new programming language is English."

"LLMs are a new kind of computer, and you program them in English. Hence I think they are well deserving of a major version upgrade in terms of the way we think about software."

Andrej Karpathy, Software Is Changing (Again), YC AI Startup School keynote, 18 June 2025 (tweet)

The formal framing: Software 1.0 was code written by humans. Software 2.0 was neural-network weights shaped by datasets (the 2017 post, entry #1 in this series). Software 3.0 is natural-language prompts that program LLMs at runtime.

What we heard

Software 3.0 is the concept that reframes everything. If a prompt is a program, then prompt engineering is not a soft skill. It is programming, with all the disciplines that implies: versioning, testing, reviewing, debugging, regression suites.

Most teams still treat prompts the way junior developers treat SQL strings in 2005. Pasted into code, tweaked by vibes, never reviewed, never tested, lost to git blame the moment someone refactors a module. Every problem that plagued pre-ORM SQL plagues modern prompts, for exactly the same reason: the thing being passed is being treated as a string, not as code.

The Software 3.0 frame makes the fix obvious. Prompts deserve everything source code deserves.

What we actually do with this

We treat prompts as first-class source artifacts in every engagement. The checklist:

Software 1.0 discipline	Applied to Software 3.0
Version control	Prompts live in `/prompts`, versioned in Git
Code review	Every prompt change goes through PR review
Unit tests	Golden-set evals run on every prompt change
Regression suites	Production failure cases become fixed eval rows
Logging	Every prompt execution emits a trace (LangSmith)
Modularity	Reusable prompt fragments imported like modules
Cost profiling	Token count is measured the same way binary size is

None of this is optional. A prompt change with no eval is a binary change with no test.

Applied: InterviewLM's prompt pipeline

We built InterviewLM's prompt layer as source code from day one:

Structure: /prompts/personas/*.md, /prompts/eval/*.md, /prompts/system.md. Every prompt is a file.
Imports: A lightweight templating layer concatenates fragments at runtime. The rubric prompt imports from a shared glossary file, the interviewer prompt imports from a behavioural-guidelines file. A change to the glossary file propagates to every persona.
Evals: 40 golden interviews run on every prompt-layer PR. If eval scores drop, the PR does not merge.
Prompt caching as linker optimization: Static fragments (rubric, glossary) go first in the prompt; dynamic fragments (candidate input) go last. Cache hit rate: 90%+. This is not clever. This is just understanding that the prompt is code and ordering statements for compilation.

The one thing to steal from this

Move your prompts out of your application code today. Create a /prompts directory. Put one prompt per file. Version them in Git. That single move, treating prompts as files instead of inline strings, changes every downstream discipline, because now git diff, git blame, code review, and CI all work on them by default.

Next in this series

#6. The march of nines (October 2025). Karpathy on Dwarkesh: AGI is a decade away, agents are a decade away, and 90%→99.999% reliability is the real production problem. The sobering finale to this arc.

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.

How much time should I budget?

Typical effort is 7 min. The cable is marked intermediate.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Get a free 2-page audit Book a 30-min discovery call

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem

claude-code·no artifact

Use auto mode, not --dangerously-skip-permissions

Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…

@frenxt · 8 mininstall →

claude-code·skill

Publish your stack to Cables (automated)

A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…

@frenxt · 10 mininstall →

claude-code·skill

Replicate Ragav's stack (skills + plugins + scripts)

Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…

@frenxt · 15 mininstall →

Share this cable

Share on Twitter Share on LinkedIn