Let Your Agent Choose Its Own Brain
Build intelligent model selection like Cursor's Auto Mode. Your agent dynamically switches between Haiku, Sonnet, and Opus based on task complexity.
10 min read
January 2026
~150 lines of code
If you've used Cursor's Auto Mode, you know how powerful it is—the editor intelligently picks the right model for each task. Simple refactors get a fast model, complex architecture decisions get a powerful one. But how do you build this yourself?
In this guide, I'll show you how to implement Cursor-style Auto Mode in LangGraph, where your agent dynamically switches between Haiku, Sonnet, and Opus based on task complexity. The agent itself decides which model to use for each turn.
Consider this scenario:
"Fix the typo in line 5"
→ Use a fast, cheap model (Haiku)
"Design a database schema for a multi-tenant SaaS"
→ Use a powerful model (Opus)
Traditionally, the model is fixed at the start of a conversation. But what if we want the agent to adapt mid-conversation based on task complexity?
Our implementation has three key components:
Tool
set_next_model()
State
requested_model_level
Middleware
Model Selection
The tool updates state → Middleware reads state → Routes to appropriate model
A tool that the agent calls to request a model change
State fields to store the requested model level
Middleware that reads the state and selects the appropriate model
Key Insight
The model switch takes effect on the NEXT model call, not immediately. This keeps the architecture clean and avoids the complexity of mid-generation model swaps.
Turn 1
Sonnet
Agent analyzes task, calls set_next_model("high")
→ requested_model_level = "high"
Turn 2
Opus
Middleware reads state, selects Opus
→ requested_model_level = None (one-shot)
Turn 3
Sonnet
No request, defaults to Sonnet
→ Back to default model
First, add fields to your agent state to track model requests:
state.py
python
from typing import Annotated, Literal
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
"""Agent state with model switching support."""
messages: Annotated[list[BaseMessage], add_messages]
# Model switching fields
requested_model_level: Literal["low", "medium", "high"] | None
requested_model_reason: str | None
# Track current model (for agent self-awareness)
current_model: str | None
current_model_display: str | NoneThe tool uses LangGraph's Command to update state properly. The key is including a ToolMessage in the update—LangGraph requires this when a tool returns a Command.
tools.py
python
import logging
from typing import Annotated, Literal
from langchain.tools import tool, ToolRuntime
from langchain_core.messages import ToolMessage
from langgraph.types import Command
from state import AgentState
logger = logging.getLogger(__name__)
LEVEL_TO_MODEL = {
"low": "haiku",
"medium": "sonnet",
"high": "opus",
}
@tool
def set_next_model(
level: Annotated[
Literal["low", "medium", "high"],
"Model quality level: 'low' (fast), 'medium' (balanced), 'high' (powerful)",
],
reason: Annotated[str, "Brief explanation of why this model level is needed"],
runtime: ToolRuntime[None, AgentState],
) -> Command:
"""Switch to a different model quality level for the NEXT response.
Use this when you need different capability for upcoming work:
- low: Quick fixes, typos, simple changes (fastest, cheapest)
- medium: Standard implementation, most coding tasks (DEFAULT)
- high: Architecture, complex debugging, multi-step planning (most capable)
The switch takes effect on the NEXT model call. One-shot: resets after use.
"""
model_name = LEVEL_TO_MODEL.get(level, "sonnet")
logger.info(f"[set_next_model] Agent requested {level} ({model_name}): {reason}")
# Get tool_call_id from runtime - required for ToolMessage
tool_call_id = runtime.tool_call_id
# Create ToolMessage - REQUIRED when returning Command from a tool
tool_message = ToolMessage(
content=f"Model set to {model_name} ({level}) for next turn. Reason: {reason}",
tool_call_id=tool_call_id,
)
# Return Command with state update AND ToolMessage
return Command(
update={
"messages": [tool_message],
"requested_model_level": level,
"requested_model_reason": reason,
}
)ToolRuntime Pattern
We use ToolRuntime[None, AgentState] to access the tool_call_id. This is the LangGraph v1 pattern for tools that need runtime context.
The middleware intercepts every model call and routes to the appropriate model based on state:
middleware.py
python
import logging
from typing import Any
from langchain.agents.middleware import wrap_model_call
from langchain.agents.middleware.types import ModelRequest, ModelResponse
from langchain_anthropic import ChatAnthropic
logger = logging.getLogger(__name__)
MODEL_DISPLAY_NAMES = {
"haiku": "Claude Haiku",
"sonnet": "Claude Sonnet",
"opus": "Claude Opus",
}
_MODEL_CACHE: dict[str, Any] = {}
def _get_model_for_key(model_key: str) -> ChatAnthropic:
"""Get or create a model instance."""
if model_key in _MODEL_CACHE:
return _MODEL_CACHE[model_key]
model_map = {
"haiku": "claude-haiku-4-5-20251001",
"sonnet": "claude-sonnet-4-5-20250929",
"opus": "claude-opus-4-5-20251101",
}
model = ChatAnthropic(
model_name=model_map.get(model_key, model_map["sonnet"]),
max_tokens=16384,
temperature=0.7,
)
_MODEL_CACHE[model_key] = model
return model
@wrap_model_call
async def model_selection_middleware(
request: ModelRequest,
handler
) -> ModelResponse:
"""Middleware that dynamically selects models based on agent requests."""
# Get model from runtime context
model_key = None
if request.runtime.context is not None:
model_key = request.runtime.context.get("model")
# Only apply dynamic routing for "auto" mode
if model_key != "auto":
if model_key:
model = _get_model_for_key(model_key)
request.model = model
return await handler(request)
# AUTO MODE: Check state for agent's model request
state = getattr(request, "state", {}) or {}
requested_level = state.get("requested_model_level")
if requested_level in ("low", "medium", "high"):
level_to_model = {"low": "haiku", "medium": "sonnet", "high": "opus"}
model_key = level_to_model[requested_level]
logger.info(
f"[ModelSelection] Auto mode using {model_key} "
f"(reason: {state.get('requested_model_reason', 'not specified')})"
)
# Clear the request after use (one-shot behavior)
if hasattr(request, "state") and request.state is not None:
request.state["requested_model_level"] = None
request.state["requested_model_reason"] = None
else:
model_key = "sonnet"
logger.info("[ModelSelection] Auto mode using default: sonnet")
model = _get_model_for_key(model_key)
if request.tools:
model = model.bind_tools(request.tools, parallel_tool_calls=True)
request.model = model
# Store current model in state for agent self-awareness
if hasattr(request, "state") and request.state is not None:
request.state["current_model"] = model_key
request.state["current_model_display"] = MODEL_DISPLAY_NAMES.get(model_key)
return await handler(request)Register the middleware and tool with your agent:
agent.py
python
from langchain.agents import create_agent
from middleware import model_selection_middleware
from tools import set_next_model
from state import AgentState
# Create the agent with middleware
agent = create_agent(
model="auto", # Use auto mode for dynamic selection
tools=[set_next_model, ...other_tools],
state_schema=AgentState,
before_model=[
model_selection_middleware,
# ...other middleware
],
)Tell the agent how to use the tool in your system prompt:
| Level | Model | Use Case | Speed | Cost |
|---|---|---|---|---|
low | Haiku | Quick fixes, typos, simple changes | Fastest | Cheapest |
medium | Sonnet | Standard implementation, most coding tasks | Balanced | Moderate |
high | Opus | Architecture, complex debugging, multi-step planning | Capable | Premium |
How it works:
set_next_model(level="high", reason="Need architecture design")We considered using Command(goto="agent") to restart the agent node with a new model. This would discard the current response and immediately switch. However:
Message history breaks
Anthropic's API requires tool_use blocks to have matching tool_result blocks. Restarting mid-turn orphans the tool call.
Complexity
Managing partial responses and state cleanup is error-prone.
UX
Users see incomplete responses that get discarded.
The "next turn" approach is cleaner—the agent finishes its thought, then the new model handles the next task.
You might think we could just do:
# DON'T DO THIS - won't persist!
runtime.state["requested_model_level"] = level
return "Model set!"This doesn't work
Direct state mutations in tools don't persist in LangGraph. The Command(update={...}) pattern is the correct way to update state from tools.
LangGraph requires a ToolMessage when a tool returns a Command. Without it, you get:
ValueError: Expected to have a matching ToolMessage in Command.updateThe ToolMessage maintains the tool call/result contract that LLMs expect.
Add a companion tool so the agent can verify which model it's running as:
tools.py
python
@tool
def get_current_model(
runtime: ToolRuntime[None, AgentState],
) -> str:
"""Check which model you're currently running as.
Returns the current model name (e.g., 'Sonnet', 'Opus', 'Haiku').
"""
state = runtime.state or {}
model_display = state.get("current_model_display", "Unknown")
model_key = state.get("current_model", "unknown")
return f"You are currently running as {model_display} ({model_key})"With this implementation:
Cost Optimization
Simple follow-up tasks use Haiku (10x cheaper than Opus)
Quality When Needed
Complex architecture decisions get Opus reasoning power
Seamless UX
Users select "Auto" and the agent handles the rest
Observability
Logs show exactly which model was selected and why
Dynamic model switching lets your agent be smart about resource allocation. The pattern is simple:
Tool updates state with the requested model level
Middleware reads state and routes to the appropriate model
One-shot behavior resets to default after use
The key insight is that switching happens between turns, not mid-generation. This keeps the implementation clean and avoids the complexity of interrupting model inference.
The full implementation is about 150 lines of code across three files—a small investment for significant cost savings and improved agent autonomy.
We specialize in building production-grade LangGraph and LangChain systems. Let's discuss your AI agent architecture.
Written by: Frenxt Consultancy
Published: January 2026
*Based on production implementations for enterprise clients.