Agent System Design Canvas — 12 Production Patterns Proven by the Claude Code Leak

On March 31, 2026, Anthropic accidentally shipped a sourcemap in their npm package, exposing Claude Code’s entire source — roughly 512,000 lines of TypeScript across 1,884 files. A packaging mistake, but inside was a definitive answer to the question: how is a production agent system actually designed?

This post distills that analysis into a 6-layer design framework and 12 core patterns. These are not theoretical — they are validated by a system processing hundreds of millions of real requests.

Why “Agent Design” Is Its Own Discipline

Dimension	Web App	Agent System
Execution	Request then Response (single)	Request then AI decision then Tool then re-evaluation (loop)
State	DB + Session	Context window + Memory + Tool state + Sub-agents
Safety	Input validation	Input + AI output + Tool execution validation
Cost	Compute	Compute + token cost (exponential on failure)

How Claude Code handles these differences is the subject of this post.

The 6-Layer Design Canvas

Six layers for designing agent systems. Each is mapped to Claude Code’s actual implementation as a production baseline.

graph TB
    subgraph "Agent System Design Canvas"
        direction TB
        T["Tool Layer\n52 tools, Zod validation, size limits"]
        S["Safety Layer\n5-stage permissions, 23-step AST analysis"]
        M["Memory Layer\nAuto-Compact, 4-type persistent memory"]
        E["Execution Layer\nParallel/sequential split, 7 modes"]
        R["Resilience Layer\nWithhold and Recover, Circuit Breaker"]
        X["Extensibility Layer\nMCP 5 transports, 44 feature flags"]
    end
    T --- S
    S --- M
    M --- E
    E --- R
    R --- X

Layer 1: Tool

Every means by which the agent interacts with the outside world.

Claude Code’s scale: 52 built-in tools (41 active + 11 feature-gated)

The tool registration pipeline is the most revealing: getAllBaseTools() then Feature gate filter then Deny rules filter then Mode filter then alphabetical sort then assembleToolPool(). The alphabetical sort is critical — reordering tools breaks API prompt caching, causing cost spikes.

Every tool conforms to a common interface:

Property	Purpose
`inputSchema` (Zod)	Schema-validates LLM-generated parameters — defends against hallucination
`isConcurrencySafe()`	Parallel execution safety — Read returns true, Edit returns false
`isReadOnly()`	Read-only flag — used by permission modes
`maxResultSizeChars`	Result size cap — overflow saves to disk, returns reference only

This interface lets all 52 tools pass through the same 10-step execution pipeline: name lookup, interrupt check, input validation, PreToolUse hook, permission check, execution, result mapping, size check, PostToolUse hook, telemetry.

Layer 2: Safety

The most underestimated layer in agent systems.

Claude Code’s BashTool security uses a Tree-sitter AST parser to decompose shell commands into syntax trees, then runs 23 analysis steps. This security module alone is 888KB across 18 files.

Checks cover command substitution ($()), variable expansion (${}), redirections, pipelines, Zsh-specific bypass vectors (=curl form), and Unicode injection (zero-width characters, null bytes, homoglyphs).

Core principle: fail-closed. If parsing itself fails, execution is blocked, not allowed.

The permission system is a 5-stage pipeline:

flowchart LR
    A["validateInput\nZod schema"] --> B["checkPermissions\ntool-level auth"]
    B --> C["PreToolUse hook\nexternal block"]
    C --> D["Rule matching\nallow/deny/ask"]
    D --> E["Permission mode\nDefault/Auto/Plan/Bypass"]

Rule priority: Local > Project > User > Flags > Policy. The closest setting wins.

Layer 3: Memory

The hidden cost center of agent systems. Real data from Claude Code proves it.

Before the Auto-Compact Circuit Breaker:

1,279 sessions had 50+ consecutive compression failures
Worst case: 3,272 consecutive failures in a single session
Approximately 250,000 API calls wasted per day in failure loops

The fix was 3 lines of code: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3.

The compression strategy is not simple truncation:

Remove images (highest token consumers first)
Group by API round
Fork a sub-agent to generate summaries (independent of main context)
Replace old messages with summaries
Restore top 5 referenced files (50K token budget)
Re-inject skills (25K budget, 5K per skill)

After summarization, most-referenced files and active skills are restored. Context is shed, but essentials are preserved.

Layer 4: Execution

Claude Code supports 7 execution modes (REPL, Headless, Coordinator, Bridge, Kairos, Daemon, Viewer), but the core query() loop is one. Mode differences are handled by Dependency Injection.

The concurrency model is the most clever part. Safe tools (Read, Grep, Glob) run up to 10 in parallel. When an unsafe tool (Edit, Bash, Write) appears, a new batch starts for solo sequential execution. The StreamingToolExecutor overlaps API streaming with tool execution — while the AI is still generating text, completed tool_use blocks execute immediately.

Layer 5: Resilience

Withhold and Recover pattern in action:

Error	Step 1	Step 2	Step 3	Final
413 Prompt Too Long	Collapse drain	Reactive compact	—	Show to user
Max Output Tokens	8K to 64K escalation	Retry (max 3)	—	Show truncation
429 Rate Limit	Check retry-after	Fast mode off	Standard model fallback	—
529 Overloaded	Retry	Retry	Alternate model	Non-foreground: give up

In Persistent mode (Anthropic internal unattended sessions), retries continue with exponential backoff for up to 6 hours.

Layer 6: Extensibility

Tier	Unit	Example
Tool	Single tool	FileRead, Bash, WebFetch
Skill	Markdown-based workflow	/commit, /review, /security-review
Plugin	Skill + Hook + MCP bundle	GitHub integration = PR review + autofix + GitHub MCP

MCP supports 5 transports (Stdio, SSE/HTTP, WebSocket, SDK, Claude.ai). 44 Feature Flags with 20 externally inactive means nearly half the roadmap is still experimental.

8 Core Design Patterns

#	Pattern	Core Idea	CC Evidence
1	Generator Streaming	query() yield + overlap tool exec	SSE streaming, Int32Array double-buffering (50x perf)
2	Feature Gate Dead Code Removal	Build-time removal of inactive code	44 flags, bun:bundle Tree-shaking
3	Memoized Context	Session-invariant context computed once	14 cache-break vectors tracked, 65ms saved at startup
4	Withhold and Recover	Auto-heal recoverable errors before display	413 triggers compact, 8K to 64K escalation
5	Lazy Import	Load modules only when called	800KB build, conditional feature gate imports
6	Immutable State	DeepImmutable + Zustand for state safety	Global AppState immutable, auto side-effect chains
7	Interruption Resilience	Disk save before every API call	/resume, SIGTERM then 30s grace then SIGKILL
8	Dependency Injection	query(deps) for mode/test switching	7 modes share one core

Generator Streaming

Not just “showing tokens in real-time.” The StreamingToolExecutor overlaps response streaming with tool execution. Rendering uses a custom React reconciler + Yoga + double-buffering + CharPool/StylePool interning. Int32Array-based ASCII buffers achieve 50x cache performance improvement.

Memoized Context

getSystemContext() and getUserContext() are computed once per session. Values stay identical across turns, maximizing API-side prompt cache hits (1-hour cache). promptCacheBreakDetection.ts tracks 14 cache-break vectors.

Dependency Injection

query(messages, deps) — one signature. REPL injects a React renderer, Headless injects NDJSON output, Bridge injects a remote renderer. Core logic identical, only deps differ.

4 Infrastructure Patterns

Tool Concurrency Model

flowchart LR
    subgraph "Batch 1 parallel"
        R1[Read] & G1[Grep] & G2[Glob]
    end
    subgraph "Batch 2 sequential"
        E1[Edit]
    end
    subgraph "Batch 3 parallel"
        R2[Read] & R3[Read]
    end
    subgraph "Batch 4 sequential"
        B1[Bash]
    end
    R1 & G1 & G2 --> E1 --> R2 & R3 --> B1

Consecutive tools with isConcurrencySafe()=true are batched for parallel execution (max 10). Unsafe tools start a new batch. If one tool in a batch fails, sibling execution stops.

Auto-Compact Strategy

Triggers when token count exceeds context_window minus 13,000. Sub-agent summarization + reference file restoration + Circuit Breaker (stop after 3 failures). Those 3 lines eliminated 250,000 wasted API calls per day.

Permission Pipeline

5-stage validation (input, tool, hook, rules, mode) + 4 modes (Default/Auto/Plan/Bypass) + hierarchical priority (Local then Project then User then Flags then Policy). BashTool’s 888KB AST security provides the deep defense.

API Retry Strategy

Different strategy per error type: 429 waits or falls back to a cheaper model, 529 switches to alternate model after 3 attempts, 401 refreshes tokens. Non-foreground tasks give up immediately on 529. Persistent mode retries up to 6 hours.

Key Numbers

Number	Context
512,000 lines / 1,884 files	CC TypeScript source size
52 (11 gated)	Built-in tools
888KB / 23 steps	Bash Security AST analysis
250,000 per day	Wasted API calls before Circuit Breaker
3 lines	Code that fixed the 250K/day waste
context_window minus 13,000	Auto-Compact threshold
50K / 25K	Token budgets for file restoration / skill re-injection
10 max	Parallel safe tool executions
14 vectors	Prompt cache break tracking
50x	Int32Array buffer rendering performance gain
44 flags (20 inactive)	Feature flags
6 hours	Persistent mode max backoff

Closing: Three Design Principles

All design decisions follow three principles: safety (blocking dangerous operations), performance (streaming, parallel execution, caching), and extensibility (tools, skills, plugins, MCP).

These three principles run through all six layers and twelve patterns. When designing an agent system, missing any one of them becomes a production problem.

Sources & Limitations

This series synthesizes the following publicly available analyses and does not directly contain leaked source code.

Source	URL	Focus
ccunpacked.dev	ccunpacked.dev	Visual architecture guide, tool/command catalog
Wikidocs Analysis	wikidocs.net/338204	Detailed technical analysis (execution flow, state, rendering)
PyTorch KR	discuss.pytorch.kr	Community analysis + HN discussion synthesis
Claw Code	github.com/ultraworkers/claw-code	Clean-room reimplementation (Rust/Python), PARITY.md gap analysis

Analysis date: April 2, 2026. Anthropic issued DMCA takedowns on 8,100+ forks and discontinued npm distribution shortly after the leak, so some sources may have changed accessibility. Features behind feature flags are unreleased and may be modified or deprecated before launch.