Skip to main content
Agents

Agent performance and limits

Understanding Claude Code agent performance characteristics: token limits, context windows, timeouts, and optimization strategies.

Agents aren't free

Claude Code agents consume tokens with every iteration. The more complex an agent is (deep recursion, many tools, large context), the more it costs. This guide gives you the keys to estimate, control, and optimize those costs.

Agents multiply consumption

A traditional prompt consumes a few thousand tokens. An agent that plans, executes, and verifies can consume 10 to 50 times more. With parallel sub-agents, the bill adds up fast. Read this guide before launching agents in production.

Token cost by depth

Each agent "turn" (one iteration of the plan-execute-verify loop) consumes input and output tokens. Here are average estimates by task type.

DepthUse casesInput tokensOutput tokensEstimated cost (Sonnet)
1 to 3 turnsSimple read, question/answer5K to 15K1K to 3K$0.02 to $0.08
5 to 10 turnsCode review, file analysis20K to 80K5K to 15K$0.10 to $0.40
10 to 20 turnsRefactoring, writing tests50K to 150K10K to 30K$0.30 to $1.00
20 to 30 turnsFull pipeline (plan + code + review)100K to 200K+20K to 50K$0.80 to $2.50

These numbers vary

Costs depend on the model used (Haiku is 10x cheaper than Opus), the size of files read, and the number of tools called per turn. Use /cost in Claude Code to track your actual consumption.

Factors that inflate costs

Several factors multiply token consumption.

Files read are injected into the context. An agent reading 10 files of 500 lines adds about 50K input tokens. Prefer targeted reads (Grep to find, then Read on the relevant lines) rather than reading entire files.

Each turn inherits the context from previous turns. At turn 15, the agent carries the history of the 14 previous turns. That's why later turns cost much more than the first ones.

Sub-agents multiply the base. If you launch 3 sub-agents of 10 turns each, the orchestrator agent also consumes its own tokens to read and synthesize their results.

Recursion depth: limits and control

The maxTurns limit

The maxTurns parameter (or --max-turns in CLI) controls the maximum number of agent iterations. It's your main safeguard against agents that loop endlessly.

# In CLI: limit to 15 turns
claude --print --max-turns 15 "Refactor the auth module"
# In TypeScript SDK
const result = await claude({
prompt: "Refactor the auth module",
options: { maxTurns: 15 },
});

Recommendations by use case

Use caseRecommended maxTurnsReason
Simple question3 to 5One or two reads + answer
Code review8 to 12Read the diff + analyze + report
Writing tests10 to 15Read code + write + execute
Refactoring15 to 25Planning + modifications + verification
Full pipeline20 to 30Multiple sequential phases

Start low, increase as needed

Always start with a low maxTurns. If the agent stops with a message like "I would need more iterations to finish", increase gradually. It's safer than setting 50 turns and ending up with a surprise bill.

Sub-agent recursion depth

A main agent can launch sub-agents, which can themselves launch sub-agents. Recursion depth is limited to prevent uncontrolled cascades.

Main agent (30 turns max)
└── Review sub-agent (10 turns max)
└── Tests sub-agent (15 turns max)
└── Sub-sub-agent? Not recommended.

In practice, two levels of depth are enough (main agent + sub-agents). Going deeper complicates debugging and multiplies costs without proportional gains.

Error handling and timeouts

Types of agent errors

Agents can fail in several ways.

Error typeCauseSolution
TimeoutAgent takes too longIncrease the timeout or reduce the scope
maxTurns reachedTask too complex for the budgetIncrease maxTurns or break the task down
Tool errorA Bash command fails, a file doesn't existAdd fallback instructions
Context overflowContext exceeds 200K tokensUse /compact or reduce reads
Rate limitToo many simultaneous API requestsSpace out agents or use a higher plan
Infinite loopAgent repeats the same actionAdd constraints to the prompt

Timeouts

By default, Bash commands launched by Claude Code have a 120-second timeout. For agents running long tasks (build, E2E tests), this timeout may be insufficient.

# In CLI: global session timeout
claude --max-turns 20 "Run the E2E tests"
# In SDK: timeout is managed at the application level
const result = await claude({
prompt: "Run the full E2E tests",
options: {
maxTurns: 20,
// The SDK waits for the agent to finish
// Handle the timeout on the application side if needed
},
});

Bash command timeout

If an agent launches an npm run test that takes 5 minutes, the command may timeout. In the agent's instructions, specify using test subsets or increasing the Bash timeout with the appropriate option.

Detecting infinite loops

An agent in an infinite loop repeats the same action without making progress. Here are the signs:

  • The same tool is called with the same parameters 3 times in a row
  • The agent re-reads a file it just modified for no reason
  • The agent's messages go in circles ("I'll try another approach... I'll try another approach...")

The solution: add stop criteria to the agent's prompt.

## Stop criteria
- If you've tried 3 different approaches without success, stop
and explain what's blocking
- If you re-read the same file more than 2 times, change strategy
- If the same test fails 3 times, flag it as a blocking issue

Retry strategies

When an agent fails, the right retry strategy depends on the error type.

Simple retry (transient errors)

For network errors, rate limits, or one-off timeouts.

async function withRetry<T>(
fn: () => Promise<T>,
maxRetries: number = 3,
delayMs: number = 2000,
): Promise<T> {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await fn();
} catch (error) {
if (attempt === maxRetries) throw error;
console.log(`Attempt ${attempt}/${maxRetries} failed, retrying...`);
await new Promise((r) => setTimeout(r, delayMs * attempt));
}
}
throw new Error("Unreachable");
}
// Usage
const result = await withRetry(() =>
claude({
prompt: "Analyze the logs from the last 24 hours",
options: { maxTurns: 10 },
})
);

Retry with rephrasing (comprehension errors)

If the agent doesn't understand the task or produces a bad result, rephrase the prompt.

async function smartRetry(originalPrompt: string): Promise<string> {
// First attempt with the original prompt
const first = await claude({
prompt: originalPrompt,
options: { maxTurns: 10 },
});
// Check the result
if (isValidResult(first.text)) {
return first.text;
}
// Second attempt with a more detailed prompt
const second = await claude({
prompt: `The previous result was unsatisfactory.
Here's what was missing: ${identifyGaps(first.text)}.
Start over from scratch with more rigor.
Original mission: ${originalPrompt}`,
options: { maxTurns: 15 },
});
return second.text;
}

Exponential backoff (rate limits)

For API rate limits, space out attempts exponentially.

import asyncio
import random
from claude_code_sdk import claude, ClaudeOptions
async def with_backoff(prompt: str, max_retries: int = 5) -> str:
"""Retry with exponential backoff and jitter."""
for attempt in range(max_retries):
try:
result = await claude(
prompt=prompt,
options=ClaudeOptions(max_turns=10),
)
return result.text
except Exception as e:
if "rate_limit" not in str(e) or attempt == max_retries - 1:
raise
delay = (2 ** attempt) + random.uniform(0, 1)
print(f"Rate limit, retrying in {delay:.1f}s...")
await asyncio.sleep(delay)
raise RuntimeError("Maximum retry attempts reached")

Production best practices

1. Rate limiting: control the throughput

If you launch agents in parallel (bug triage, multi-server monitoring), limit the number of simultaneous agents.

import pLimit from "p-limit";
// Maximum 3 agents in parallel
const limit = pLimit(3);
const issues = await getOpenIssues();
const results = await Promise.all(
issues.map((issue) =>
limit(() =>
claude({
prompt: `Triage issue #${issue.number}: ${issue.title}`,
options: { maxTurns: 5 },
})
)
)
);

2. Budgets: set cost limits

Set a maximum budget per agent and per day to avoid surprises.

// Budget per agent execution
const MAX_TOKENS_PER_AGENT = 100_000; // ~$0.50 with Sonnet
// Daily budget
const DAILY_TOKEN_BUDGET = 1_000_000; // ~$5.00 with Sonnet
let dailyTokensUsed = 0;
async function budgetedAgent(prompt: string): Promise<string> {
if (dailyTokensUsed >= DAILY_TOKEN_BUDGET) {
throw new Error("Daily budget exhausted");
}
const result = await claude({
prompt,
options: { maxTurns: 10 },
});
dailyTokensUsed += result.tokensUsed ?? 0;
return result.text;
}

3. Logs: trace everything

In production, every agent execution should be traced for debugging and auditing.

interface AgentLog {
readonly id: string;
readonly prompt: string;
readonly startedAt: Date;
readonly completedAt: Date;
readonly tokensUsed: number;
readonly turnsUsed: number;
readonly result: "success" | "error" | "timeout";
readonly output: string;
}
async function loggedAgent(prompt: string): Promise<string> {
const startedAt = new Date();
const id = crypto.randomUUID();
try {
const result = await claude({
prompt,
options: { maxTurns: 15 },
});
const log: AgentLog = {
id,
prompt,
startedAt,
completedAt: new Date(),
tokensUsed: result.tokensUsed ?? 0,
turnsUsed: result.turnsUsed ?? 0,
result: "success",
output: result.text,
};
await saveLog(log); // Your logging system
return result.text;
} catch (error) {
const log: AgentLog = {
id,
prompt,
startedAt,
completedAt: new Date(),
tokensUsed: 0,
turnsUsed: 0,
result: "error",
output: String(error),
};
await saveLog(log);
throw error;
}
}

4. Alerts: react to anomalies

Set up alerts when an agent exceeds a cost or error threshold.

// Alert if an agent costs more than $2
if ((result.tokensUsed ?? 0) > 200_000) {
await notifySlack(
"#alerts",
`Expensive agent detected: ${result.tokensUsed} tokens `
+ `for "${prompt.substring(0, 50)}..."`
);
}
// Alert if more than 3 failures in 1 hour
const recentErrors = await getRecentErrors(60 * 60 * 1000);
if (recentErrors.length > 3) {
await notifySlack(
"#alerts",
`${recentErrors.length} agent errors in 1h. Check the logs.`
);
}

Recommendations summary

AspectRecommendation
maxTurnsStart at 10, increase as needed
Recursion depth2 levels max (main + sub-agents)
ModelHaiku for simple tasks, Sonnet for dev, Opus for architecture
RetryExponential backoff for rate limits, rephrasing for logic errors
BudgetSet a cap per agent and per day
LogsTrace every execution (prompt, tokens, result)
Parallel agentsMaximum 3 to 5 simultaneous depending on your plan
TimeoutsAdjust based on the expected task duration

Next steps