- Agents
- Orchestration
The art of multi-agent orchestration
Multi-agent orchestration is about combining multiple agents to accomplish complex tasks that no single agent could handle efficiently. This is the advanced level of Claude Code usage, the one that turns an intelligent assistant into a true automated development team.
The 4 orchestration patterns
1. Sequential pattern
The simplest pattern: agents execute one after another, each using the previous one's result as input.
# Sequential: each agent depends on the previous one> Step 1: Use the planner agent to plan the refactoring> Step 2: Use the tdd-guide agent to implement according to the plan> Step 3: Use the code-reviewer agent to validate the code> Step 4: Use the doc-updater agent to update the docs
When to use sequential?
Use this pattern when each step depends on the previous one's result. This is the typical development pipeline: plan → code → review → document. Simple, predictable, easy to debug.
Advantages:
- Easy to understand and debug
- Each step has clear context
- Errors are easily traceable
Disadvantages:
- Slow: steps cannot be parallelized
- If one step fails, the entire pipeline stops
2. Parallel pattern
Multiple agents work simultaneously on independent tasks, then their results are merged.
# Parallel: agents work at the same time> Launch in parallel:> - Agent security-reviewer: audit the auth module> - Agent code-reviewer: review the API module> - Agent e2e-runner: test the user journey> Then synthesize the results from all three agents.
This pattern is ideal when tasks are independent. Claude Code can launch sub-agents simultaneously using the run in background feature.
# Conceptually, Claude Code does:# 1. Launches 3 sub-agents in parallel (run_in_background: true)# 2. Waits for all 3 to finish# 3. Consolidates results into a single report
Advantages:
- Much faster than sequential
- Uses resources efficiently
Disadvantages:
- Agents cannot depend on each other's results
- Risk of conflicts if agents modify the same files
3. Pipeline pattern
A pipeline combines sequential and parallel: some steps are parallelized, others are sequential.
# Complete release pipeline> Execute this pipeline:>> Phase 1 (parallel):> - Agent tdd-guide: verify all tests pass> - Agent security-reviewer: security audit>> Phase 2 (sequential, after Phase 1):> - Agent code-reviewer: final code review>> Phase 3 (parallel, after Phase 2):> - Agent doc-updater: documentation update> - Agent e2e-runner: end-to-end tests>> Phase 4 (sequential, after Phase 3):> - Prepare the release tag and changelog
Phase 1: Parallel checks
Tests and the security audit are independent and can run in parallel. If either fails, the pipeline stops.
Phase 2: Sequential review
The review can only start once tests and security are validated. The reviewer needs to know the code is functional and safe.
Phase 3: Documentation and E2E
Documentation and E2E tests are independent. They can run in parallel after the review.
Phase 4: Release
Release preparation is only triggered if all previous steps are green.
4. Split-role pattern (multi-perspective)
Multiple agents analyze the same subject from different angles, then a synthesizer agent combines the perspectives.
# Split-role: multiple perspectives on the same problem> Analyze this PR from 4 different angles:>> Agent 1 (factual): Verify the code does what the PR says> Agent 2 (senior): Evaluate quality and maintainability> Agent 3 (security): Look for security flaws> Agent 4 (consistency): Check consistency with the rest of the codebase>> Then synthesize the 4 analyses into a consolidated report.
Context management between agents
One of the major challenges of orchestration is context management. Each agent has its own context window, and information is not automatically shared.
Context passing strategies
# Strategy 1: Via filesAgent A writes its results to a file.Agent B reads that file at the start of its mission.# Strategy 2: Via the promptThe orchestrator agent summarizes Agent A's resultand includes it in Agent B's prompt.# Strategy 3: Via GitAgent A commits its changes.Agent B works on the same branch and sees the modifications.
Watch out for context overflow
Each sub-agent consumes context in the main agent. If you launch too many sub-agents or their results are too verbose, the main agent can hit its context window limit. Prefer concise, structured results.
Worktrees for isolation
Git worktrees are essential for multi-agent orchestration. They let each agent work in an isolated copy of the code without risk of conflict.
# Conceptually, Claude Code creates isolated worktrees:# Agent 1 works in /tmp/worktree-securitygit worktree add /tmp/worktree-security main# Agent 2 works in /tmp/worktree-testsgit worktree add /tmp/worktree-tests main# Agent 3 works in /tmp/worktree-docsgit worktree add /tmp/worktree-docs main# Each agent modifies its files without affecting the others# At the end, changes are merged
When to use worktrees?
| Situation | Worktree? | Reason |
|---|---|---|
| Agents that only read | No | No risk of conflict |
| Agents modifying different files | Optional | Low risk of conflict |
| Agents modifying the same files | Yes | High risk of conflict |
| Agents in parallel | Recommended | Guaranteed isolation |
Run in background
The run in background feature lets you launch sub-agents without blocking the main agent. This is essential for parallelization.
# Without background: forced sequential# Agent A works... (60 seconds)# Agent B works... (60 seconds)# Total: 120 seconds# With background: parallel# Agent A works in background... (60 seconds)# Agent B works in background... (60 seconds)# Total: 60 seconds (both in parallel)
The main agent launches sub-agents in the background, continues its work, then retrieves results when they're ready.
Best practices
1. Avoid context overflow
The golden rule: never use more than 80% of the context window for multi-agent operations. Keep a margin for corrections and adjustments.
# GOOD: Concise results"The security audit found 3 issues:1 CRITICAL (missing CSRF), 2 MEDIUM (rate limiting)."# BAD: Verbose results"I analyzed each file one by one. First auth.ts,which contains 342 lines of code. Line 42 isinteresting because..." (500-line report)
2. Avoid duplicate work
Clearly define each agent's responsibilities to prevent two agents from doing the same work.
# BAD: OverlapAgent 1: "Review the code and check security"Agent 2: "Check security and code quality"# → Both do security = duplication# GOOD: Distinct responsibilitiesAgent 1: "Review code quality (readability, patterns, tests)"Agent 2: "Security audit only (injection, XSS, secrets)"# → Each in its own domain, no overlap
3. Define success criteria
Each agent must know when its task is successfully completed.
## Success criteria for the testing agent- All tests pass (exit code 0)- Code coverage > 80%- No flaky tests (rerun 3 times if a test fails)- Coverage report generated in /coverage
4. Plan for error handling
What happens if an agent fails? Define a fallback plan.
# Fallback planIf the security-reviewer agent finds a CRITICAL issue:→ Stop the pipeline→ Notify the developer with the issue details→ Do NOT continue to review or releaseIf the e2e-runner agent fails on a test:→ Rerun the test 2 times (might be a flaky test)→ If still failing, flag it and continue
Full example: release pipeline
Here's a prompt that orchestrates a complete release pipeline using all the patterns.
> Execute a release pipeline for version 2.3.0:>> 1. PLANNING (sequential)> - Use the planner agent to list all changes> since the last tag>> 2. CHECKS (parallel)> - Agent tdd-guide: all tests pass, coverage 80%+> - Agent security-reviewer: full security audit> - Agent refactor-cleaner: no dead code introduced>> 3. REVIEW (split-role)> - Quality perspective: clean and maintainable code> - Performance perspective: no regressions> - Consistency perspective: coherent with the codebase>> 4. DOCUMENTATION (parallel)> - Agent doc-updater: update technical docs> - Generate the changelog since the last tag>> 5. RELEASE (sequential)> - If everything is green: create the v2.3.0 tag> - Generate the release notes>> If a CRITICAL step fails, stop everything and give me> a detailed report of the problem.
This pipeline combines all 4 orchestration patterns for a robust and automated release process.
Comparison with other multi-agent tools
Claude Code isn't the only tool offering agents. Here's how it compares to the main alternatives.
Claude Code vs Devin
Devin (Cognition AI) is an autonomous development agent that runs in a complete cloud environment (browser, terminal, editor).
| Criterion | Claude Code | Devin |
|---|---|---|
| Environment | Your local terminal | Cloud (dedicated VM) |
| Control | Full, you see every action | Autonomous, final result |
| Cost | Pay-as-you-go (tokens) | Monthly subscription |
| Customization | Custom agents, MCP, Skills | Limited to built-in capabilities |
| Collaboration | You stay in the loop | The agent works alone |
| Integration | Terminal, SDK, CI/CD | Web interface + GitHub PRs |
Claude Code favors control and customization. Devin favors full autonomy. For well-defined and repetitive tasks, Devin may be more practical. For day-to-day development with fine-grained control, Claude Code has the edge.
Claude Code vs Aider
Aider is an open-source pair-programming tool with LLMs, compatible with multiple models (GPT-4, Claude, etc.).
| Criterion | Claude Code | Aider |
|---|---|---|
| Models | Claude only (Haiku, Sonnet, Opus) | Multi-model (GPT-4, Claude, Gemini...) |
| Agents | Sub-agents, orchestration, SDK | No agent system |
| Ecosystem | MCP, Skills, Plugins | Limited to code editing |
| Mode | Interactive terminal + headless | Interactive terminal |
| Pricing | Included in Max/Pro subscription or API | Free (you pay for the API) |
Aider is excellent for simple pair-programming (editing code file by file). Claude Code goes further with multi-agent orchestration, MCPs for connecting external services, and the SDK for automation.
Claude Code vs CrewAI
CrewAI is a Python framework for orchestrating specialized AI agents.
| Criterion | Claude Code | CrewAI |
|---|---|---|
| Nature | Complete tool (terminal + SDK) | Python code framework |
| Agents | Built-in, ready to use | Must be built entirely |
| Models | Claude (optimized) | Multi-model |
| Setup | npm install and you're ready | Python project, code to write |
| Tools | Bash, Read, Edit, Grep, MCP... | Must integrate manually |
| Use cases | Software development | Any type of agent (marketing, research...) |
CrewAI offers more flexibility for building custom multi-agent systems in any domain. Claude Code is optimized for software development with ready-to-use tools. If your need is 100% development, Claude Code is more productive. If you're building agents outside of development, CrewAI offers more freedom.
Multi-agent architectures
Beyond orchestration patterns, two major architectures structure multi-agent systems.
Leader/worker architecture
A main agent (leader) coordinates multiple specialized agents (workers). The leader receives the request, breaks it into sub-tasks, and distributes them.
# Leader: the orchestrator agent> You coordinate 3 workers for the "CSV export" feature.> Break down the task and assign each part.# Worker 1: Backend# → Implement the /api/export endpoint# Worker 2: Frontend# → Add the export button to the UI# Worker 3: Tests# → Write E2E tests for the export flow
This is the default architecture in Claude Code when it uses sub-agents: the main agent is the leader, the sub-agents are the workers.
Strengths: centralized coordination, clear global view, easy to debug. Weaknesses: the leader is a single point of failure, it consumes a lot of context.
Peer-to-peer architecture
Agents communicate directly with each other without a central coordinator. Each agent knows its role and knows when to hand off.
# Agent Teams in peer-to-peer mode# Each agent works and signals when done# Other agents react to changes# Developer agent: codes → signals "code ready"# Tester agent: detects "code ready" → writes tests# Reviewer agent: detects "tests written" → reviews everything
This architecture corresponds to Claude Code's Agent Teams mode (see Agent Teams). Each agent has its own session and communicates via files and Git state.
Strengths: no central bottleneck, more resilient. Weaknesses: more complex coordination, risk of conflicts, harder debugging.
CI/CD integration with agents
Agents integrate into your CI/CD pipelines to automate pre-merge checks.
GitHub Actions
# .github/workflows/agent-review.ymlname: Agent Reviewon:pull_request:types: [opened, synchronize]jobs:review:runs-on: ubuntu-lateststeps:- uses: actions/checkout@v4with:fetch-depth: 0- name: Setup Claude Coderun: npm install -g @anthropic-ai/claude-code- name: Agent Reviewenv:ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}run: |claude --print --max-turns 15 \"Do a complete review of this PR.Analyze the diff with git diff origin/main...HEAD.Produce a report with issues by severity.If you find a CRITICAL, end with EXIT_CODE=1."
GitLab CI
# .gitlab-ci.ymlagent-security-audit:stage: reviewimage: node:20-alpinebefore_script:- npm install -g @anthropic-ai/claude-codescript:- |claude --print --max-turns 10 \"Security audit on the diff for this MR.Look for: SQL injections, XSS, hardcoded secrets,vulnerable dependencies. JSON format."rules:- if: $CI_PIPELINE_SOURCE == "merge_request_event"
Full pipeline with the SDK
For finer control, use the SDK in a Node.js script called by your CI.
// scripts/ci-review.tsimport { claude } from "@anthropic-ai/claude-code-sdk";async function ciReview() {// Phase 1: Securityconst security = await claude({prompt: "Security audit of the diff against main",options: { maxTurns: 10, allowedTools: ["Bash", "Read", "Grep"] },});// Phase 2: Testsconst tests = await claude({prompt: "Verify that test coverage is > 80%",options: { maxTurns: 8, allowedTools: ["Bash", "Read"] },});// Consolidated resultconst hasCritical = security.text.includes("CRITICAL");const lowCoverage = tests.text.includes("< 80%");if (hasCritical || lowCoverage) {console.error("Review failed:");if (hasCritical) console.error("- CRITICAL security issue");if (lowCoverage) console.error("- Insufficient coverage");process.exit(1);}console.log("Review OK");}ciReview();
Command, Agent, Skill: which one to use?
Claude Code offers three complementary mechanisms to structure your workflows. They don't do the same thing, and telling them apart will save you from building an Agent when a Skill would have been enough.
The comparison table
| Criterion | Skill (.claude/skills/) | Agent (.claude/agents/) | Command (.claude/commands/) |
|---|---|---|---|
| Trigger | Manual slash command (/skill) | Auto (Claude decides) or via Agent tool | Manual slash command (/project:cmd) |
| Context | Shared with the main session | Isolated (its own window) | Shared with the main session |
| Autonomy | Instructions followed by the orchestrator | Autonomous sub-agent, makes its own decisions | Instructions followed by the orchestrator |
| Persistence | No, loaded on demand | Memory possible (files, notes) | No, loaded on demand |
| Use case | Repetitive workflows, work recipes | Complex tasks that shouldn't pollute the main context | Project scripts shared across the team |
The decision tree
Before choosing, ask yourself these three questions in order.
I want to automate something. What exactly is it?
1. Is it a task I trigger myself, repeatedly?
└─ Yes → Skill or Command (depending on whether it's personal or shared)
└─ No → next question
2. Is the task complex, and I want it to run without polluting
my main context?
└─ Yes → Agent (isolated context, autonomous)
└─ No → next question
3. Do I just want to give Claude permanent knowledge
about this project?
└─ Yes → CLAUDE.md (not an agent, not a skill: just context)
In practice, the guiding principle is simple: prefer the lightest mechanism that fits. A Skill covers 80% of use cases. An Agent is the right choice when you need real isolation or autonomy.
The Command + Agent + Skill pattern
These three mechanisms work together. The most powerful combination looks like this.
User
│
└─ /project:pre-commit ← Command (manual trigger)
│
├─ Agent code-reviewer ← Agent (isolated context, autonomous)
│ │
│ └─ Skill tdd-guide ← Preloaded skill (domain knowledge)
│
└─ Skill changelog-format ← Skill invoked inline to format output
The Command is the entry point: the user triggers it, it orchestrates the rest. The Agent handles the complex part in its own context. The Skill brings the specialized knowledge the agent needs.
The same need, three approaches
Let's take a concrete example: checking code quality before a commit. You can solve this with each of the three mechanisms. Here's how, and more importantly, when you'd pick one over the other.
Approach 1: a Skill /pre-commit
The simplest solution. A Markdown file in ~/.claude/skills/ that describes the steps to follow.
# Pre-commit Quality CheckYou are a thorough code reviewer. Before every commit, check the following.## Steps1. Run tests: `npm test`- If a test fails, stop and explain the problem2. Run lint: `npm run lint`- List errors by file and severity3. Check TypeScript types: `npm run type-check`4. Analyze the diff (`git diff --staged`) and look for:- Hardcoded secrets or tokens- Forgotten `console.log` calls- Unused imports## Output formatFor each issue found:- **File**: path- **Type**: TEST / LINT / TYPE / SECURITY- **Severity**: BLOCKING / WARNING- **Description**: what's wrongIf everything is clean: "Ready to commit."
# Usage/user:pre-commit
When to choose this approach: for personal use, across any project. The Skill runs in your main context, you see every step in real time. Fast to create, easy to iterate on.
Limitation: if the check takes time or produces a lot of text, it fills up your context window.
Approach 2: an Agent code-reviewer
Same goal, but the work happens in an isolated context. The agent is more autonomous: it can re-run commands, fix minor issues on its own, and only presents you with a final report.
# Code Reviewer Agent## RoleYou are a senior code reviewer. You work autonomously to verify codequality before a commit. You can fix minor issues (auto-fixable lint,unused imports) without asking for confirmation.## Available tools- Bash (run commands)- Read / Edit (read and fix files)- Grep (search in code)## Instructions1. Get the staged diff: `git diff --staged`2. Run tests: `npm test -- --passWithNoTests`3. Run lint with auto-fix: `npm run lint -- --fix`4. Check types: `npm run type-check`5. Look for problematic patterns in modified files:- Secret regex: `(api_key|password|token)\s*=\s*['"][^'"]+['"]`- `console\.log` in non-test files6. If BLOCKING issues remain, list them with suggested fixes.Otherwise, confirm the code is ready.## Constraints- Never commit yourself- Only modify files already in the staged diff- Keep the report concise: one line per issue maximum
# The agent is invoked automatically by Claude when the context calls for it,# or explicitly:> Use the code-reviewer agent on the current diff
When to choose this approach: when you want to fully delegate the check. The agent works in its own context while you keep working on something else. Ideal for large codebases where checks generate a lot of output.
Limitation: more setup time, less visibility into what's happening along the way.
Approach 3: a project Skill code-quality
Here the goal is not to run commands but to provide knowledge about the project's quality standards. This Skill will be read by other agents or invoked directly to get a contextual opinion.
# Project Quality Standards## TypeScript rules- No explicit `any`. Use `unknown` if the type is genuinely unknown.- Interfaces prefixed with `I` are forbidden (convention: `type` or interface without prefix).- Every public function must have a minimal JSDoc (description + `@param` + `@returns`).## Testing rules- Minimum coverage: 80% on branches.- A file `utils/format.ts` must have a corresponding `utils/format.test.ts`.- Mocks go in `__mocks__/`, never inline inside test files.## Security- No API keys in the code (use environment variables).- API endpoints must validate inputs with Zod before processing.- No `dangerouslySetInnerHTML` without an explicit review.## Commit messageFormat: `type(scope): description` (Conventional Commits).Valid types: feat, fix, docs, chore, refactor, test, perf.
# Direct invocation for an opinion/project:code-quality# Or used as a reference in an agent prompt> Following the standards defined in the code-quality skill,> check this file: src/api/users.ts
When to choose this approach: when you want to centralize project rules and make them accessible to everyone (agents, developers, reviews). This Skill does nothing on its own — it's a source of truth. It pairs naturally with the code-reviewer agent, which can read it before starting work.
Summary of the three approaches
Skill /pre-commit | Agent code-reviewer | Skill code-quality | |
|---|---|---|---|
| What it does | Runs the check | Runs and fixes autonomously | Documents the standards |
| Context | Main (visible) | Isolated (transparent) | Main or reference |
| Auto-fix | No | Yes (minor issues) | Not applicable |
| Setup time | 5 minutes | 15 minutes | 10 minutes |
| Combinable | Standalone or with others | Can read the code-quality Skill | Referenced by others |
| Best for | Quick personal use | Full delegation | Team standardization |
Next steps
You now master multi-agent orchestration. Continue learning with these related resources.
- Claude Agent SDK: Create programmatic agents in TypeScript and Python
- Performance and limits: Costs, recursion depth, and best practices
- Understanding agents: Back to fundamentals
- Create a sub-agent: Build custom agents for your needs
- Headless mode and CI/CD: Integrate Claude Code into your pipelines