Advanced

Token optimization and cost reduction for Claude Code

Guide
Performance

Lower your Anthropic bill without sacrificing quality. Concrete patterns: native prompt caching, caveman skill, claude-code-router, /compact, model selection. Comparison and combos.

Publié le 11 mai 2026

TL;DR

Claude Code can get expensive fast: a 4h session on a large project costs $1-5 in API key, much more if Opus 4.7 is used everywhere.
5 concrete levers to reduce the bill: native prompt caching (up to 90% savings on cache hit), /compact command (shrinks conversation), caveman skill (65% fewer output tokens), claude-code-router (switches Haiku/Sonnet/Opus by context), Claude Max subscription (flat rate instead of pay-as-you-go).
The biggest gain is often picking the right model: Haiku 4.5 for 80% of tasks, Sonnet 4.6 for daily complex work, Opus 4.7 only for architecture and tough debugging.

Understanding where the money goes

Before optimizing, know what costs. On a typical Claude Code session:

System prompt + CLAUDE.md: ~5K to 20K tokens (input), paid at session start
File reading: every Read injects content as input tokens
Model responses: output (3-5× more expensive than input at Anthropic)
Tool calls + results: invisible in UI but consume tokens
Parallel sessions / delegated agents: each has its own context

Lever 1: Native prompt caching (up to 90% savings)

Anthropic has had prompt caching since late 2024. When an identical prefix reappears across successive requests (typically system prompt + CLAUDE.md + first files), it's billed at the cache hit rate (~10% of normal).

Claude Code uses it automatically. To maximize hit rate:

Keep the system prompt stable between requests (avoid editing CLAUDE.md mid-session)
Put large reference files in CLAUDE.md instead of Read-ing them each time
Avoid switching between models in the same session (each model has its own cache)

See docs.anthropic.com/en/docs/build-with-claude/prompt-caching (ouvre un nouvel onglet) for the full docs.

Lever 2: `/compact` when the session drags on

When a session runs > 2h or you pass 100K tokens, Claude starts losing initial context (context rot). The /compact command asks Claude to compress the conversation into a structured summary, restarting at ~10K tokens instead of 150K+.

When to use it:

After resolving a big step (deploy, refactor, etc.)
Before starting a new sub-task that doesn't need full history
When you see context degrading (Claude forgets decisions taken 1h earlier)

See /prompting/context-management for details.

Lever 3: `caveman` skill (–65% output tokens)

JuliusBrussee/caveman (ouvre un nouvel onglet) (58,284 ★ as of 2026-05-11). Skill that forces Claude to answer in a "caveman" style: short sentences, minimal vocabulary, no ornament.

Before: "I will now proceed to carry out an in-depth analysis of this function to identify any potential performance or security issues that might arise."

After (caveman): "Analyze function. Look for perf and security bugs."

Author-reported: 65% reduction in output tokens on common tasks, with minor impact on technical accuracy. Test on your project: some workflows (architecture, pedagogical explanation) suffer from the style.

When to enable: repetitive tasks where conciseness matters more than pedagogy. File editing, running tests, factual debugging.

When to avoid: code reviews to explain to a junior, documentation, pair-programming sessions where nuance matters.

Lever 4: `claude-code-router` (right model at the right time)

musistudio/claude-code-router (ouvre un nouvel onglet) (33,826 ★ as of 2026-05-11). Router that decides which Anthropic model is used based on request context.

Typical logic:

Haiku 4.5: file reads, running tests, simple git operations
Sonnet 4.6: code edits, refactoring, standard debugging
Opus 4.7: architecture, complex debugging, algorithm design

Author-measured savings: 30 to 50% on a developer's monthly bill who used Opus everywhere. Latency also drops (Haiku replies in ~2s vs Opus ~8s).

Risk: if the router misroutes a complex task to Haiku, quality drops. Calibrate on your real workflow for 1-2 weeks before trusting it blindly.

Lever 5: Claude Max subscription instead of API key

For a developer using Claude Code more than 2-3h per day, the Claude Max subscription (from ~$100/month, verify on claude.com/pricing (ouvre un nouvel onglet)) becomes more cost-effective than pay-as-you-go API billing.

Profile	API key (pay-as-you-go)	Claude Max
Occasional dev (under 5h/week)	$10-30/month	$100/month minimum (overkill)
Daily dev (3-4h/day)	$100-300/month	$100/month (net win)
Heavy dev + agents (8h+)	$300-1000/month	$100-500/month per tier

See /getting-started/installation for Max auth procedure.

Recommended starter combo

A 3-step recipe to cut your bill 40-60% without rewriting your workflow:

Pick Claude Max if you use Claude Code > 2h/day. Otherwise stay on API key.
Install claude-code-router and let it run 2 weeks in passive observation. Adjust thresholds if you see too much Haiku struggling or too much Opus over-engineering.
/compact after every big step (becomes a reflex in 1-2 weeks).

What NOT to do on day 1:

Install caveman everywhere (test first on a non-critical workflow)
Disable native prompt caching (rarely worth it)
Force Haiku for everything (quality drops sharply on complex tasks)

Next steps

Dive deeper into prompt caching: official doc docs.anthropic.com/.../prompt-caching (ouvre un nouvel onglet).
Master context management: /prompting/context-management.
Detailed multi-model approach: /advanced/cross-model-workflow.
Compare optimization skills: /ecosystem/awesome-skills.
Keep memory across sessions to avoid re-paying context: /advanced/memoire-persistante.

Persistent memory across Claude Code sessions

Observability and monitoring for Claude Code

Token optimization and cost reduction for Claude Code

TL;DR

Understanding where the money goes

Lever 1: Native prompt caching (up to 90% savings)

Lever 2: /compact when the session drags on

Lever 3: caveman skill (–65% output tokens)

Lever 4: claude-code-router (right model at the right time)

Lever 5: Claude Max subscription instead of API key

Recommended starter combo

Next steps

Lever 2: `/compact` when the session drags on

Lever 3: `caveman` skill (–65% output tokens)

Lever 4: `claude-code-router` (right model at the right time)