Minbook
KO
Agent System Design Canvas — 12 Production Patterns Proven by the Claude Code Leak

Agent System Design Canvas — 12 Production Patterns Proven by the Claude Code Leak

MJ · · 5 min read

6-Layer agent system design canvas and 8+4 core patterns from the Claude Code source leak (512K lines TS). From the Circuit Breaker that stopped 250K/day wasted API calls with 3 lines, to 23-step bash AST security.

On March 31, 2026, Anthropic accidentally shipped a sourcemap in their npm package, exposing Claude Code’s entire source — roughly 512,000 lines of TypeScript across 1,884 files. A packaging mistake, but inside was a definitive answer to the question: how is a production agent system actually designed?

This post distills that analysis into a 6-layer design framework and 12 core patterns. These are not theoretical — they are validated by a system processing hundreds of millions of real requests.


Why “Agent Design” Is Its Own Discipline

DimensionWeb AppAgent System
ExecutionRequest then Response (single)Request then AI decision then Tool then re-evaluation (loop)
StateDB + SessionContext window + Memory + Tool state + Sub-agents
SafetyInput validationInput + AI output + Tool execution validation
CostComputeCompute + token cost (exponential on failure)

How Claude Code handles these differences is the subject of this post.


The 6-Layer Design Canvas

Six layers for designing agent systems. Each is mapped to Claude Code’s actual implementation as a production baseline.

graph TB
    subgraph "Agent System Design Canvas"
        direction TB
        T["Tool Layer\n52 tools, Zod validation, size limits"]
        S["Safety Layer\n5-stage permissions, 23-step AST analysis"]
        M["Memory Layer\nAuto-Compact, 4-type persistent memory"]
        E["Execution Layer\nParallel/sequential split, 7 modes"]
        R["Resilience Layer\nWithhold and Recover, Circuit Breaker"]
        X["Extensibility Layer\nMCP 5 transports, 44 feature flags"]
    end
    T --- S
    S --- M
    M --- E
    E --- R
    R --- X

Layer 1: Tool

Every means by which the agent interacts with the outside world.

Claude Code’s scale: 52 built-in tools (41 active + 11 feature-gated)

The tool registration pipeline is the most revealing: getAllBaseTools() then Feature gate filter then Deny rules filter then Mode filter then alphabetical sort then assembleToolPool(). The alphabetical sort is critical — reordering tools breaks API prompt caching, causing cost spikes.

Every tool conforms to a common interface:

PropertyPurpose
inputSchema (Zod)Schema-validates LLM-generated parameters — defends against hallucination
isConcurrencySafe()Parallel execution safety — Read returns true, Edit returns false
isReadOnly()Read-only flag — used by permission modes
maxResultSizeCharsResult size cap — overflow saves to disk, returns reference only

This interface lets all 52 tools pass through the same 10-step execution pipeline: name lookup, interrupt check, input validation, PreToolUse hook, permission check, execution, result mapping, size check, PostToolUse hook, telemetry.

Layer 2: Safety

The most underestimated layer in agent systems.

Claude Code’s BashTool security uses a Tree-sitter AST parser to decompose shell commands into syntax trees, then runs 23 analysis steps. This security module alone is 888KB across 18 files.

Checks cover command substitution ($()), variable expansion (${}), redirections, pipelines, Zsh-specific bypass vectors (=curl form), and Unicode injection (zero-width characters, null bytes, homoglyphs).

Core principle: fail-closed. If parsing itself fails, execution is blocked, not allowed.

The permission system is a 5-stage pipeline:

flowchart LR
    A["validateInput\nZod schema"] --> B["checkPermissions\ntool-level auth"]
    B --> C["PreToolUse hook\nexternal block"]
    C --> D["Rule matching\nallow/deny/ask"]
    D --> E["Permission mode\nDefault/Auto/Plan/Bypass"]

Rule priority: Local > Project > User > Flags > Policy. The closest setting wins.

Layer 3: Memory

The hidden cost center of agent systems. Real data from Claude Code proves it.

Before the Auto-Compact Circuit Breaker:

  • 1,279 sessions had 50+ consecutive compression failures
  • Worst case: 3,272 consecutive failures in a single session
  • Approximately 250,000 API calls wasted per day in failure loops

The fix was 3 lines of code: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3.

The compression strategy is not simple truncation:

  1. Remove images (highest token consumers first)
  2. Group by API round
  3. Fork a sub-agent to generate summaries (independent of main context)
  4. Replace old messages with summaries
  5. Restore top 5 referenced files (50K token budget)
  6. Re-inject skills (25K budget, 5K per skill)

After summarization, most-referenced files and active skills are restored. Context is shed, but essentials are preserved.

Layer 4: Execution

Claude Code supports 7 execution modes (REPL, Headless, Coordinator, Bridge, Kairos, Daemon, Viewer), but the core query() loop is one. Mode differences are handled by Dependency Injection.

The concurrency model is the most clever part. Safe tools (Read, Grep, Glob) run up to 10 in parallel. When an unsafe tool (Edit, Bash, Write) appears, a new batch starts for solo sequential execution. The StreamingToolExecutor overlaps API streaming with tool execution — while the AI is still generating text, completed tool_use blocks execute immediately.

Layer 5: Resilience

Withhold and Recover pattern in action:

ErrorStep 1Step 2Step 3Final
413 Prompt Too LongCollapse drainReactive compactShow to user
Max Output Tokens8K to 64K escalationRetry (max 3)Show truncation
429 Rate LimitCheck retry-afterFast mode offStandard model fallback
529 OverloadedRetryRetryAlternate modelNon-foreground: give up

In Persistent mode (Anthropic internal unattended sessions), retries continue with exponential backoff for up to 6 hours.

Layer 6: Extensibility

TierUnitExample
ToolSingle toolFileRead, Bash, WebFetch
SkillMarkdown-based workflow/commit, /review, /security-review
PluginSkill + Hook + MCP bundleGitHub integration = PR review + autofix + GitHub MCP

MCP supports 5 transports (Stdio, SSE/HTTP, WebSocket, SDK, Claude.ai). 44 Feature Flags with 20 externally inactive means nearly half the roadmap is still experimental.


8 Core Design Patterns

#PatternCore IdeaCC Evidence
1Generator Streamingquery() yield + overlap tool execSSE streaming, Int32Array double-buffering (50x perf)
2Feature Gate Dead Code RemovalBuild-time removal of inactive code44 flags, bun:bundle Tree-shaking
3Memoized ContextSession-invariant context computed once14 cache-break vectors tracked, 65ms saved at startup
4Withhold and RecoverAuto-heal recoverable errors before display413 triggers compact, 8K to 64K escalation
5Lazy ImportLoad modules only when called800KB build, conditional feature gate imports
6Immutable StateDeepImmutable + Zustand for state safetyGlobal AppState immutable, auto side-effect chains
7Interruption ResilienceDisk save before every API call/resume, SIGTERM then 30s grace then SIGKILL
8Dependency Injectionquery(deps) for mode/test switching7 modes share one core

Generator Streaming

Not just “showing tokens in real-time.” The StreamingToolExecutor overlaps response streaming with tool execution. Rendering uses a custom React reconciler + Yoga + double-buffering + CharPool/StylePool interning. Int32Array-based ASCII buffers achieve 50x cache performance improvement.

Memoized Context

getSystemContext() and getUserContext() are computed once per session. Values stay identical across turns, maximizing API-side prompt cache hits (1-hour cache). promptCacheBreakDetection.ts tracks 14 cache-break vectors.

Dependency Injection

query(messages, deps) — one signature. REPL injects a React renderer, Headless injects NDJSON output, Bridge injects a remote renderer. Core logic identical, only deps differ.


4 Infrastructure Patterns

Tool Concurrency Model

flowchart LR
    subgraph "Batch 1 parallel"
        R1[Read] & G1[Grep] & G2[Glob]
    end
    subgraph "Batch 2 sequential"
        E1[Edit]
    end
    subgraph "Batch 3 parallel"
        R2[Read] & R3[Read]
    end
    subgraph "Batch 4 sequential"
        B1[Bash]
    end
    R1 & G1 & G2 --> E1 --> R2 & R3 --> B1

Consecutive tools with isConcurrencySafe()=true are batched for parallel execution (max 10). Unsafe tools start a new batch. If one tool in a batch fails, sibling execution stops.

Auto-Compact Strategy

Triggers when token count exceeds context_window minus 13,000. Sub-agent summarization + reference file restoration + Circuit Breaker (stop after 3 failures). Those 3 lines eliminated 250,000 wasted API calls per day.

Permission Pipeline

5-stage validation (input, tool, hook, rules, mode) + 4 modes (Default/Auto/Plan/Bypass) + hierarchical priority (Local then Project then User then Flags then Policy). BashTool’s 888KB AST security provides the deep defense.

API Retry Strategy

Different strategy per error type: 429 waits or falls back to a cheaper model, 529 switches to alternate model after 3 attempts, 401 refreshes tokens. Non-foreground tasks give up immediately on 529. Persistent mode retries up to 6 hours.


Key Numbers

NumberContext
512,000 lines / 1,884 filesCC TypeScript source size
52 (11 gated)Built-in tools
888KB / 23 stepsBash Security AST analysis
250,000 per dayWasted API calls before Circuit Breaker
3 linesCode that fixed the 250K/day waste
context_window minus 13,000Auto-Compact threshold
50K / 25KToken budgets for file restoration / skill re-injection
10 maxParallel safe tool executions
14 vectorsPrompt cache break tracking
50xInt32Array buffer rendering performance gain
44 flags (20 inactive)Feature flags
6 hoursPersistent mode max backoff

Closing: Three Design Principles

All design decisions follow three principles: safety (blocking dangerous operations), performance (streaming, parallel execution, caching), and extensibility (tools, skills, plugins, MCP).

These three principles run through all six layers and twelve patterns. When designing an agent system, missing any one of them becomes a production problem.



Sources & Limitations

This series synthesizes the following publicly available analyses and does not directly contain leaked source code.

SourceURLFocus
ccunpacked.devccunpacked.devVisual architecture guide, tool/command catalog
Wikidocs Analysiswikidocs.net/338204Detailed technical analysis (execution flow, state, rendering)
PyTorch KRdiscuss.pytorch.krCommunity analysis + HN discussion synthesis
Claw Codegithub.com/ultraworkers/claw-codeClean-room reimplementation (Rust/Python), PARITY.md gap analysis

Analysis date: April 2, 2026. Anthropic issued DMCA takedowns on 8,100+ forks and discontinued npm distribution shortly after the leak, so some sources may have changed accessibility. Features behind feature flags are unreleased and may be modified or deprecated before launch.

Share

Related Posts