Karpathy's Eureka Labs bet (2024): what AI-native education reveals about his model of the future

Read the field note below to see how we apply this pattern in real Claude Code projects.

verified 2 months ago5 min

The Karpathy Playbook #3: Eureka Labs

Part 3 of 6. Tracing Karpathy's public thinking with FRE|Nxt Labs production commentary.

The announcement

"⚡️ Excited to share that I am starting an AI+Education company called Eureka Labs. We are Eureka Labs and we are building a new kind of school that is AI native. How can we approach an ideal experience for learning something new?"

@karpathy, 16 July 2024

Five months after leaving OpenAI, his pitch: teachers still design course materials, but an AI teaching assistant guides each student through them. First product: LLM101n, an undergraduate course where students train their own LLM.

What we heard

Eureka Labs is the most concrete signal Karpathy has given about what he thinks "AI native" actually means as a product pattern. And it's the opposite of what most people build.

Most "AI-powered" products are a thin wrapper: take a human-designed experience, bolt on a chatbot. Eureka's model is the reverse. A domain expert designs the content once. The AI's job is to be the teacher. Ask the student questions, diagnose what they don't understand, re-explain in their vocabulary, pace the session. The human designs the thing being taught. The AI designs the teaching.

That split (expert designs the artifact, AI personalises the delivery) is a genuinely new product pattern. It is what separates an AI-native product from an AI feature bolted onto a SaaS tool.

What we actually do with this

When we scope an AI product with a client, the first question we ask now is: which half are you building?

The artifact (built once, by a domain expert):

Course content, legal templates, medical protocols, interview rubrics, compliance checklists
This is deterministic, reviewable, versioned. No AI.

The teacher (built once, by us, powered by an LLM):

Personalization, pacing, follow-up questions, diagnostic probes, re-explanations
This is probabilistic, evaluated via samples, tuned continuously.

Clients usually show up with only half the spec. They'll have a brilliant artifact and no idea how to deliver it (just "put a chatbot on it"). Or they'll have a chatbot and nothing principled for it to teach. The Eureka frame forces both halves to exist before we write code.

Applied: InterviewLM's two halves

InterviewLM is structured exactly this way:

Artifact half: The interview rubrics, job-role competency frameworks, and evaluation criteria. Designed by hiring managers and psychometric experts. Stored as structured data, versioned in Git, reviewable by humans.

Teacher half: The interviewer persona agents. They read the rubric and run a conversational assessment. Ask, probe, redirect, score. This half is LLM-native, runs on LangGraph, uses prompt caching for the rubric (90%+ cache hits because the rubric is stable across all candidates).

The artifact is updated monthly by human experts. The teacher half is updated weekly by us as we observe failure modes in production traces. The two halves evolve independently.

The one thing to steal from this

For your next AI product spec, write two docs, not one:

The artifact doc. What the domain expert is contributing
The teacher doc. What the LLM is personalizing or pacing

If you can't write the artifact doc, your product isn't ready for AI. It's ready for a designer. If you can't write the teacher doc, you don't have an AI product. You have a content library.

Next in this series

#4. Vibe coding (February 2025). Karpathy's throwaway tweet that named a generation of AI-assisted development. And why "mostly works" is not a production SLA.

Quick answers

What do I get from this cable?

You get a dated field note that explains how we handle this ai-industry workflow in real Claude Code projects.

How much time should I budget?

Typical effort is 5 min. The cable is marked beginner.

How do I install the artifact?

This cable is guidance-only and does not ship an installable artifact.

How fresh is the guidance?

The cable is explicitly last verified on 2026-04-17, and includes source links for traceability.

Work with FRE|Nxt

We build the production AI systems we write about.

Cables are the field notes. The playbooks come from client engagements — multi-agent systems, RAG pipelines, and LLM cost cuts that ship and hold up in production. If something here maps to a problem on your roadmap, two ways in:

Get a free 2-page audit Book a 30-min discovery call

Audit capacity: 5 slots/month · No pitch deck · NDA on request

Same shelf · Fix a specific problem

claude-code·no artifact

Use auto mode, not --dangerously-skip-permissions

Two flags promise to stop Claude Code from pausing at every tool call. One of them reads your settings, honours your allowlist, and refuses to run anything g…

@frenxt · 8 mininstall →

claude-code·skill

Publish your stack to Cables (automated)

A skill that walks Claude Code through publishing your Claude stack to the Cables community in one conversation. No manual repo setup, no hand-written `stack…

@frenxt · 10 mininstall →

claude-code·skill

Replicate Ragav's stack (skills + plugins + scripts)

Pick the stack that matches what you're building. Each one is a single `npx` command. Plugins installed, skills synced, marketplaces configured, no bash scri…

@frenxt · 15 mininstall →

Share this cable

Share on Twitter Share on LinkedIn