Technical Guide

Implementing Auto Mode like Cursor in LangGraph

Let Your Agent Choose Its Own Brain

Build intelligent model selection like Cursor's Auto Mode. Your agent dynamically switches between Haiku, Sonnet, and Opus based on task complexity.

10 min read

January 2026

~150 lines of code

LangGraphLangChainPythonClaude HaikuClaude SonnetClaude Opus

If you've used Cursor's Auto Mode, you know how powerful it is—the editor intelligently picks the right model for each task. Simple refactors get a fast model, complex architecture decisions get a powerful one. But how do you build this yourself?

In this guide, I'll show you how to implement Cursor-style Auto Mode in LangGraph, where your agent dynamically switches between Haiku, Sonnet, and Opus based on task complexity. The agent itself decides which model to use for each turn.

The Problem

Consider this scenario:

"Fix the typo in line 5"

→ Use a fast, cheap model (Haiku)

"Design a database schema for a multi-tenant SaaS"

→ Use a powerful model (Opus)

Traditionally, the model is fixed at the start of a conversation. But what if we want the agent to adapt mid-conversation based on task complexity?

The Solution Architecture

Our implementation has three key components:

Tool

set_next_model()

State

requested_model_level

Middleware

Model Selection

The tool updates state → Middleware reads state → Routes to appropriate model

A tool that the agent calls to request a model change

State fields to store the requested model level

Middleware that reads the state and selects the appropriate model

Key Insight

The model switch takes effect on the NEXT model call, not immediately. This keeps the architecture clean and avoids the complexity of mid-generation model swaps.

How It Works

Turn 1

Sonnet

Agent analyzes task, calls set_next_model("high")

→ requested_model_level = "high"

Turn 2

Opus

Middleware reads state, selects Opus

→ requested_model_level = None (one-shot)

Turn 3

Sonnet

No request, defaults to Sonnet

→ Back to default model

Implementation

Step 1: Define the State

First, add fields to your agent state to track model requests:

state.py

python

from typing import Annotated, Literal
from typing_extensions import TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages

class AgentState(TypedDict):
    """Agent state with model switching support."""

    messages: Annotated[list[BaseMessage], add_messages]

    # Model switching fields
    requested_model_level: Literal["low", "medium", "high"] | None
    requested_model_reason: str | None

    # Track current model (for agent self-awareness)
    current_model: str | None
    current_model_display: str | None

Step 2: Create the Tool

The tool uses LangGraph's Command to update state properly. The key is including a ToolMessage in the update—LangGraph requires this when a tool returns a Command.

tools.py

python

import logging
from typing import Annotated, Literal

from langchain.tools import tool, ToolRuntime
from langchain_core.messages import ToolMessage
from langgraph.types import Command

from state import AgentState

logger = logging.getLogger(__name__)

LEVEL_TO_MODEL = {
    "low": "haiku",
    "medium": "sonnet",
    "high": "opus",
}


@tool
def set_next_model(
    level: Annotated[
        Literal["low", "medium", "high"],
        "Model quality level: 'low' (fast), 'medium' (balanced), 'high' (powerful)",
    ],
    reason: Annotated[str, "Brief explanation of why this model level is needed"],
    runtime: ToolRuntime[None, AgentState],
) -> Command:
    """Switch to a different model quality level for the NEXT response.

    Use this when you need different capability for upcoming work:
    - low: Quick fixes, typos, simple changes (fastest, cheapest)
    - medium: Standard implementation, most coding tasks (DEFAULT)
    - high: Architecture, complex debugging, multi-step planning (most capable)

    The switch takes effect on the NEXT model call. One-shot: resets after use.
    """
    model_name = LEVEL_TO_MODEL.get(level, "sonnet")

    logger.info(f"[set_next_model] Agent requested {level} ({model_name}): {reason}")

    # Get tool_call_id from runtime - required for ToolMessage
    tool_call_id = runtime.tool_call_id

    # Create ToolMessage - REQUIRED when returning Command from a tool
    tool_message = ToolMessage(
        content=f"Model set to {model_name} ({level}) for next turn. Reason: {reason}",
        tool_call_id=tool_call_id,
    )

    # Return Command with state update AND ToolMessage
    return Command(
        update={
            "messages": [tool_message],
            "requested_model_level": level,
            "requested_model_reason": reason,
        }
    )

ToolRuntime Pattern

We use ToolRuntime[None, AgentState] to access the tool_call_id. This is the LangGraph v1 pattern for tools that need runtime context.

Step 3: Create the Model Selection Middleware

The middleware intercepts every model call and routes to the appropriate model based on state:

middleware.py

python

import logging
from typing import Any

from langchain.agents.middleware import wrap_model_call
from langchain.agents.middleware.types import ModelRequest, ModelResponse
from langchain_anthropic import ChatAnthropic

logger = logging.getLogger(__name__)

MODEL_DISPLAY_NAMES = {
    "haiku": "Claude Haiku",
    "sonnet": "Claude Sonnet",
    "opus": "Claude Opus",
}

_MODEL_CACHE: dict[str, Any] = {}


def _get_model_for_key(model_key: str) -> ChatAnthropic:
    """Get or create a model instance."""
    if model_key in _MODEL_CACHE:
        return _MODEL_CACHE[model_key]

    model_map = {
        "haiku": "claude-haiku-4-5-20251001",
        "sonnet": "claude-sonnet-4-5-20250929",
        "opus": "claude-opus-4-5-20251101",
    }

    model = ChatAnthropic(
        model_name=model_map.get(model_key, model_map["sonnet"]),
        max_tokens=16384,
        temperature=0.7,
    )

    _MODEL_CACHE[model_key] = model
    return model


@wrap_model_call
async def model_selection_middleware(
    request: ModelRequest,
    handler
) -> ModelResponse:
    """Middleware that dynamically selects models based on agent requests."""

    # Get model from runtime context
    model_key = None
    if request.runtime.context is not None:
        model_key = request.runtime.context.get("model")

    # Only apply dynamic routing for "auto" mode
    if model_key != "auto":
        if model_key:
            model = _get_model_for_key(model_key)
            request.model = model
        return await handler(request)

    # AUTO MODE: Check state for agent's model request
    state = getattr(request, "state", {}) or {}
    requested_level = state.get("requested_model_level")

    if requested_level in ("low", "medium", "high"):
        level_to_model = {"low": "haiku", "medium": "sonnet", "high": "opus"}
        model_key = level_to_model[requested_level]

        logger.info(
            f"[ModelSelection] Auto mode using {model_key} "
            f"(reason: {state.get('requested_model_reason', 'not specified')})"
        )

        # Clear the request after use (one-shot behavior)
        if hasattr(request, "state") and request.state is not None:
            request.state["requested_model_level"] = None
            request.state["requested_model_reason"] = None
    else:
        model_key = "sonnet"
        logger.info("[ModelSelection] Auto mode using default: sonnet")

    model = _get_model_for_key(model_key)

    if request.tools:
        model = model.bind_tools(request.tools, parallel_tool_calls=True)

    request.model = model

    # Store current model in state for agent self-awareness
    if hasattr(request, "state") and request.state is not None:
        request.state["current_model"] = model_key
        request.state["current_model_display"] = MODEL_DISPLAY_NAMES.get(model_key)

    return await handler(request)

Step 4: Wire It All Together

agent.py

python

from langchain.agents import create_agent

from middleware import model_selection_middleware
from tools import set_next_model
from state import AgentState

# Create the agent with middleware
agent = create_agent(
    model="auto",  # Use auto mode for dynamic selection
    tools=[set_next_model, ...other_tools],
    state_schema=AgentState,
    before_model=[
        model_selection_middleware,
        # ...other middleware
    ],
)

Step 5: Add Agent Instructions

Tell the agent how to use the tool in your system prompt:

Level	Model	Use Case	Speed	Cost
`low`	Haiku	Quick fixes, typos, simple changes	Fastest	Cheapest
`medium`	Sonnet	Standard implementation, most coding tasks	Balanced	Moderate
`high`	Opus	Architecture, complex debugging, multi-step planning	Capable	Premium

How it works:

1.Call set_next_model(level="high", reason="Need architecture design")
2.Continue with your current response
3.Your NEXT turn uses the requested model
4.Resets to Sonnet after one use

Why This Approach?

Why not switch immediately?

We considered using Command(goto="agent") to restart the agent node with a new model. This would discard the current response and immediately switch. However:

Message history breaks

Anthropic's API requires tool_use blocks to have matching tool_result blocks. Restarting mid-turn orphans the tool call.

Complexity

Managing partial responses and state cleanup is error-prone.

Users see incomplete responses that get discarded.

The "next turn" approach is cleaner—the agent finishes its thought, then the new model handles the next task.

Why use Command instead of direct state mutation?

You might think we could just do:

# DON'T DO THIS - won't persist!
runtime.state["requested_model_level"] = level
return "Model set!"

This doesn't work

Direct state mutations in tools don't persist in LangGraph. The Command(update={...}) pattern is the correct way to update state from tools.

Why include ToolMessage in Command?

LangGraph requires a ToolMessage when a tool returns a Command. Without it, you get:

ValueError: Expected to have a matching ToolMessage in Command.update

The ToolMessage maintains the tool call/result contract that LLMs expect.

Optional: Let the Agent Check Its Current Model

Add a companion tool so the agent can verify which model it's running as:

tools.py

python

@tool
def get_current_model(
    runtime: ToolRuntime[None, AgentState],
) -> str:
    """Check which model you're currently running as.

    Returns the current model name (e.g., 'Sonnet', 'Opus', 'Haiku').
    """
    state = runtime.state or {}
    model_display = state.get("current_model_display", "Unknown")
    model_key = state.get("current_model", "unknown")

    return f"You are currently running as {model_display} ({model_key})"

Real-World Results

With this implementation:

Cost Optimization

Simple follow-up tasks use Haiku (10x cheaper than Opus)

Quality When Needed

Complex architecture decisions get Opus reasoning power

Seamless UX

Users select "Auto" and the agent handles the rest

Observability

Logs show exactly which model was selected and why

Conclusion

Dynamic model switching lets your agent be smart about resource allocation. The pattern is simple:

Tool updates state with the requested model level

Middleware reads state and routes to the appropriate model

One-shot behavior resets to default after use

The key insight is that switching happens between turns, not mid-generation. This keeps the implementation clean and avoids the complexity of interrupting model inference.

The full implementation is about 150 lines of code across three files—a small investment for significant cost savings and improved agent autonomy.

Need Help Implementing This?

We specialize in building production-grade LangGraph and LangChain systems. Let's discuss your AI agent architecture.

Start Your Project View More Case Studies

#LangGraph#LangChain#AI Agents#Claude#Python#Machine Learning#Cost Optimization

Written by: Frenxt Consultancy

Published: January 2026

*Based on production implementations for enterprise clients.