6-Layer agent system design canvas and 8+4 core patterns from the Claude Code source leak (512K lines TS). From the Circuit Breaker that stopped 250K/day wasted API calls with 3 lines, to 23-step bash AST security.
On March 31, 2026, Anthropic accidentally shipped a sourcemap in their npm package, exposing Claude Code’s entire source — roughly 512,000 lines of TypeScript across 1,884 files. A packaging mistake, but inside was a definitive answer to the question: how is a production agent system actually designed?
This post distills that analysis into a 6-layer design framework and 12 core patterns. These are not theoretical — they are validated by a system processing hundreds of millions of real requests.
Why “Agent Design” Is Its Own Discipline
| Dimension | Web App | Agent System |
|---|---|---|
| Execution | Request then Response (single) | Request then AI decision then Tool then re-evaluation (loop) |
| State | DB + Session | Context window + Memory + Tool state + Sub-agents |
| Safety | Input validation | Input + AI output + Tool execution validation |
| Cost | Compute | Compute + token cost (exponential on failure) |
How Claude Code handles these differences is the subject of this post.
The 6-Layer Design Canvas
Six layers for designing agent systems. Each is mapped to Claude Code’s actual implementation as a production baseline.
graph TB
subgraph "Agent System Design Canvas"
direction TB
T["Tool Layer\n52 tools, Zod validation, size limits"]
S["Safety Layer\n5-stage permissions, 23-step AST analysis"]
M["Memory Layer\nAuto-Compact, 4-type persistent memory"]
E["Execution Layer\nParallel/sequential split, 7 modes"]
R["Resilience Layer\nWithhold and Recover, Circuit Breaker"]
X["Extensibility Layer\nMCP 5 transports, 44 feature flags"]
end
T --- S
S --- M
M --- E
E --- R
R --- X
Layer 1: Tool
Every means by which the agent interacts with the outside world.
Claude Code’s scale: 52 built-in tools (41 active + 11 feature-gated)
The tool registration pipeline is the most revealing: getAllBaseTools() then Feature gate filter then Deny rules filter then Mode filter then alphabetical sort then assembleToolPool(). The alphabetical sort is critical — reordering tools breaks API prompt caching, causing cost spikes.
Every tool conforms to a common interface:
| Property | Purpose |
|---|---|
inputSchema (Zod) | Schema-validates LLM-generated parameters — defends against hallucination |
isConcurrencySafe() | Parallel execution safety — Read returns true, Edit returns false |
isReadOnly() | Read-only flag — used by permission modes |
maxResultSizeChars | Result size cap — overflow saves to disk, returns reference only |
This interface lets all 52 tools pass through the same 10-step execution pipeline: name lookup, interrupt check, input validation, PreToolUse hook, permission check, execution, result mapping, size check, PostToolUse hook, telemetry.
Layer 2: Safety
The most underestimated layer in agent systems.
Claude Code’s BashTool security uses a Tree-sitter AST parser to decompose shell commands into syntax trees, then runs 23 analysis steps. This security module alone is 888KB across 18 files.
Checks cover command substitution ($()), variable expansion (${}), redirections, pipelines, Zsh-specific bypass vectors (=curl form), and Unicode injection (zero-width characters, null bytes, homoglyphs).
Core principle: fail-closed. If parsing itself fails, execution is blocked, not allowed.
The permission system is a 5-stage pipeline:
flowchart LR
A["validateInput\nZod schema"] --> B["checkPermissions\ntool-level auth"]
B --> C["PreToolUse hook\nexternal block"]
C --> D["Rule matching\nallow/deny/ask"]
D --> E["Permission mode\nDefault/Auto/Plan/Bypass"]
Rule priority: Local > Project > User > Flags > Policy. The closest setting wins.
Layer 3: Memory
The hidden cost center of agent systems. Real data from Claude Code proves it.
Before the Auto-Compact Circuit Breaker:
- 1,279 sessions had 50+ consecutive compression failures
- Worst case: 3,272 consecutive failures in a single session
- Approximately 250,000 API calls wasted per day in failure loops
The fix was 3 lines of code: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3.
The compression strategy is not simple truncation:
- Remove images (highest token consumers first)
- Group by API round
- Fork a sub-agent to generate summaries (independent of main context)
- Replace old messages with summaries
- Restore top 5 referenced files (50K token budget)
- Re-inject skills (25K budget, 5K per skill)
After summarization, most-referenced files and active skills are restored. Context is shed, but essentials are preserved.
Layer 4: Execution
Claude Code supports 7 execution modes (REPL, Headless, Coordinator, Bridge, Kairos, Daemon, Viewer), but the core query() loop is one. Mode differences are handled by Dependency Injection.
The concurrency model is the most clever part. Safe tools (Read, Grep, Glob) run up to 10 in parallel. When an unsafe tool (Edit, Bash, Write) appears, a new batch starts for solo sequential execution. The StreamingToolExecutor overlaps API streaming with tool execution — while the AI is still generating text, completed tool_use blocks execute immediately.
Layer 5: Resilience
Withhold and Recover pattern in action:
| Error | Step 1 | Step 2 | Step 3 | Final |
|---|---|---|---|---|
| 413 Prompt Too Long | Collapse drain | Reactive compact | — | Show to user |
| Max Output Tokens | 8K to 64K escalation | Retry (max 3) | — | Show truncation |
| 429 Rate Limit | Check retry-after | Fast mode off | Standard model fallback | — |
| 529 Overloaded | Retry | Retry | Alternate model | Non-foreground: give up |
In Persistent mode (Anthropic internal unattended sessions), retries continue with exponential backoff for up to 6 hours.
Layer 6: Extensibility
| Tier | Unit | Example |
|---|---|---|
| Tool | Single tool | FileRead, Bash, WebFetch |
| Skill | Markdown-based workflow | /commit, /review, /security-review |
| Plugin | Skill + Hook + MCP bundle | GitHub integration = PR review + autofix + GitHub MCP |
MCP supports 5 transports (Stdio, SSE/HTTP, WebSocket, SDK, Claude.ai). 44 Feature Flags with 20 externally inactive means nearly half the roadmap is still experimental.
8 Core Design Patterns
| # | Pattern | Core Idea | CC Evidence |
|---|---|---|---|
| 1 | Generator Streaming | query() yield + overlap tool exec | SSE streaming, Int32Array double-buffering (50x perf) |
| 2 | Feature Gate Dead Code Removal | Build-time removal of inactive code | 44 flags, bun:bundle Tree-shaking |
| 3 | Memoized Context | Session-invariant context computed once | 14 cache-break vectors tracked, 65ms saved at startup |
| 4 | Withhold and Recover | Auto-heal recoverable errors before display | 413 triggers compact, 8K to 64K escalation |
| 5 | Lazy Import | Load modules only when called | 800KB build, conditional feature gate imports |
| 6 | Immutable State | DeepImmutable + Zustand for state safety | Global AppState immutable, auto side-effect chains |
| 7 | Interruption Resilience | Disk save before every API call | /resume, SIGTERM then 30s grace then SIGKILL |
| 8 | Dependency Injection | query(deps) for mode/test switching | 7 modes share one core |
Generator Streaming
Not just “showing tokens in real-time.” The StreamingToolExecutor overlaps response streaming with tool execution. Rendering uses a custom React reconciler + Yoga + double-buffering + CharPool/StylePool interning. Int32Array-based ASCII buffers achieve 50x cache performance improvement.
Memoized Context
getSystemContext() and getUserContext() are computed once per session. Values stay identical across turns, maximizing API-side prompt cache hits (1-hour cache). promptCacheBreakDetection.ts tracks 14 cache-break vectors.
Dependency Injection
query(messages, deps) — one signature. REPL injects a React renderer, Headless injects NDJSON output, Bridge injects a remote renderer. Core logic identical, only deps differ.
4 Infrastructure Patterns
Tool Concurrency Model
flowchart LR
subgraph "Batch 1 parallel"
R1[Read] & G1[Grep] & G2[Glob]
end
subgraph "Batch 2 sequential"
E1[Edit]
end
subgraph "Batch 3 parallel"
R2[Read] & R3[Read]
end
subgraph "Batch 4 sequential"
B1[Bash]
end
R1 & G1 & G2 --> E1 --> R2 & R3 --> B1
Consecutive tools with isConcurrencySafe()=true are batched for parallel execution (max 10). Unsafe tools start a new batch. If one tool in a batch fails, sibling execution stops.
Auto-Compact Strategy
Triggers when token count exceeds context_window minus 13,000. Sub-agent summarization + reference file restoration + Circuit Breaker (stop after 3 failures). Those 3 lines eliminated 250,000 wasted API calls per day.
Permission Pipeline
5-stage validation (input, tool, hook, rules, mode) + 4 modes (Default/Auto/Plan/Bypass) + hierarchical priority (Local then Project then User then Flags then Policy). BashTool’s 888KB AST security provides the deep defense.
API Retry Strategy
Different strategy per error type: 429 waits or falls back to a cheaper model, 529 switches to alternate model after 3 attempts, 401 refreshes tokens. Non-foreground tasks give up immediately on 529. Persistent mode retries up to 6 hours.
Key Numbers
| Number | Context |
|---|---|
| 512,000 lines / 1,884 files | CC TypeScript source size |
| 52 (11 gated) | Built-in tools |
| 888KB / 23 steps | Bash Security AST analysis |
| 250,000 per day | Wasted API calls before Circuit Breaker |
| 3 lines | Code that fixed the 250K/day waste |
| context_window minus 13,000 | Auto-Compact threshold |
| 50K / 25K | Token budgets for file restoration / skill re-injection |
| 10 max | Parallel safe tool executions |
| 14 vectors | Prompt cache break tracking |
| 50x | Int32Array buffer rendering performance gain |
| 44 flags (20 inactive) | Feature flags |
| 6 hours | Persistent mode max backoff |
Closing: Three Design Principles
All design decisions follow three principles: safety (blocking dangerous operations), performance (streaming, parallel execution, caching), and extensibility (tools, skills, plugins, MCP).
These three principles run through all six layers and twelve patterns. When designing an agent system, missing any one of them becomes a production problem.
Sources & Limitations
This series synthesizes the following publicly available analyses and does not directly contain leaked source code.
| Source | URL | Focus |
|---|---|---|
| ccunpacked.dev | ccunpacked.dev | Visual architecture guide, tool/command catalog |
| Wikidocs Analysis | wikidocs.net/338204 | Detailed technical analysis (execution flow, state, rendering) |
| PyTorch KR | discuss.pytorch.kr | Community analysis + HN discussion synthesis |
| Claw Code | github.com/ultraworkers/claw-code | Clean-room reimplementation (Rust/Python), PARITY.md gap analysis |
Analysis date: April 2, 2026. Anthropic issued DMCA takedowns on 8,100+ forks and discontinued npm distribution shortly after the leak, so some sources may have changed accessibility. Features behind feature flags are unreleased and may be modified or deprecated before launch.
Related Posts

What the Claude Code Leak Revealed: Anatomy of an AI Agent
The March 31, 2026 npm sourcemap incident revealed Claude Code internals. 4-phase execution, 7 modes, and the 11-step Agent Loop analyzed.

52 Tools, 23-Step Security: Inside an Agent's Tool System
52 built-in tools' common interface, 10-step execution pipeline, safe=parallel/unsafe=sequential concurrency model, 5-stage permission pipeline, and 888KB Tree-sitter 23-step bash security.

GEO Paper Review: Evaluation Systems and Manipulation Risks
Review of SAGEO Arena and CORE papers, analyzing the need for integrated GEO evaluation frameworks and the vulnerability of AI search rankings (91.4% Top-5 manipulation success rate).