Minbook
KO
The Memory System I Built Looked Like Claude Code's Internal Design

The Memory System I Built Looked Like Claude Code's Internal Design

MJ · · 9 min read

CC internal Memory System (4-type persistent memory, Auto-Dream, Auto-Compact) vs independently built 3-layer system (documents→index→semantic search). validate_placement() is the differentiator CC doesn't have.

I spent six months building a memory system for my Claude Code workflow. Three layers: structured documents, a cross-project index, and a semantic search MCP server backed by vector embeddings. Session lifecycle hooks. A placement validator that tells the AI where information belongs.

Then Claude Code’s source leaked on March 31, 2026 — 512,000 lines of TypeScript via an accidental source map inclusion in the npm package. And inside that source, I found Auto-Dream: a memory consolidation engine with four typed memory categories, post-session extraction, and automatic pruning.

The architectures converge in ways that cannot be coincidence. They also diverge in ways that reveal what each side optimized for. This post maps both systems side by side.


Part 1: Claude Code’s Internal Memory System

The memory/ Directory

Claude Code stores persistent memory at ~/.claude/projects/{slug}/memory/. The structure is deliberately simple:

~/.claude/projects/{slug}/
├── MEMORY.md           # Index (max 200 lines / 25KB)
└── memory/
    ├── user-prefs.md   # User type
    ├── feedback-*.md   # Feedback type
    ├── project-*.md    # Project type
    └── ref-*.md        # Reference type

MEMORY.md serves as the index. It has a hard cap of 200 lines / 25KB — because it gets injected into the context window at every session start. Size is cost.

Four Memory Types

Claude Code classifies memory into four types:

TypeContentLifecycleExample
userRole, expertise, preferencesLong-term (rarely changes)“Prefers Python, concise code, Korean responses”
feedbackCorrections + WHY + how to applyMedium-term (merged when patterns emerge)See structured example below
projectGoals, deadlines, decisionsMedium-term (project lifespan)“Phase 2 deadline: 2026-05-15”
referenceExternal system pointersLong-term (while reference exists)“API spec: docs/api-v2.md”

The feedback type has a particularly precise structure. It is not just “do this instead.” It leads with the rule, then explains why, then shows how to apply:

Rule: Specify file names explicitly when using git add
Why: git add . or -A may include .env, credentials, or other sensitive files
How to apply: Use git add specific-file.ts format. List multiple files individually

The project type converts relative dates to absolute dates. “Due next week” becomes “Due 2026-04-13.” The reference point for “next week” shifts between sessions; an absolute date does not.

What NOT to save is defined just as explicitly:

ExcludedReason
Code patternsAlready in the codebase — search for it
Architecture detailsBelongs in docs/ directory
Git historyAvailable via git log
Debugging recipesOne-off — if recurring, codify it
Anything in CLAUDE.mdDuplicate storage causes inconsistency

This exclusion list matters more than the inclusion list. A memory system’s quality is determined by what it refuses to remember, not what it stores.

Auto-Dream: Post-Session Memory Consolidation

Auto-Dream is Claude Code’s most significant memory innovation. It remains unreleased but is fully implemented in the source.

When a session ends:

  1. A forked sub-agent spawns (independent of the main context)
  2. The sub-agent reviews the entire conversation from the beginning
  3. It extracts content matching the four memory types
  4. It writes organized entries to memory/
  5. It cleans up stale context — entries that are no longer valid
  6. It strengthens associations between related memory entries

The “sub-agent” design is the key insight. Memory consolidation does not consume the main session’s context window. It mirrors sleep-dependent memory consolidation in neuroscience — the brain processes and organizes memories during sleep, not during waking hours.

Auto-Dream connects to KAIROS, a proactive monitoring system that runs on a 5-minute cron. KAIROS watches for filesystem changes and can trigger memory consolidation independently of session boundaries.

Auto-Compact: Runtime Context Management

Where Auto-Dream handles post-session long-term memory, Auto-Compact manages in-session short-term context.

When the context window fills:

  1. Remove images first (low information density per token)
  2. Group by API round
  3. Generate summary via forked sub-agent
  4. Replace previous messages with summary
  5. Restore top 5 referenced files (50K token budget)
  6. Re-inject active skills (25K budget, 5K per skill)

Steps 5 and 6 are the sophistication. Context is reduced, but the most-referenced files and active skills are restored. Discard the bulk; preserve the essentials.

The circuit breaker: MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3. If compaction fails three times in a row, it stops trying. Before these three lines were added, approximately 250,000 API calls per day were wasted in failure loops.

Context Injection and Post-Sampling Hooks

At every session start, two functions fire:

FunctionInjected ContentCaching
getSystemContext()System prompt, tool definitions, policiesMemoized per session
getUserContext()CLAUDE.md rules, MEMORY.md, memory filesMemoized per session

Memoization prevents redundant reads within the same session. It checks file mtime — if unchanged, the previous result is reused. This integrates with API-side prompt caching (1-hour cache), reducing token costs significantly.

Post-Sampling Hooks execute after every response generation:

  • Auto-Compact trigger (context limit check)
  • Memory extraction (write session findings to memory/)
  • Dream Mode trigger (at session end, launch Auto-Dream)
flowchart TB
    subgraph "Claude Code Memory Architecture"
        direction TB
        subgraph "Session Start"
            SC["getSystemContext()\nSystem prompt + tools\n(memoized)"]
            UC["getUserContext()\nCLAUDE.md + MEMORY.md\n(memoized)"]
        end
        subgraph "During Session"
            AC["Auto-Compact\nContext compression\nTop 5 files restored (50K)\nSkills re-injected (25K)"]
            CB["Circuit Breaker\nMAX_FAILURES = 3"]
        end
        subgraph "Post-Sampling Hooks"
            PSH["auto-compact check\nmemory extraction\ndream mode trigger"]
        end
        subgraph "Post Session"
            AD["Auto-Dream\nSub-agent consolidation\nKAIROS integration"]
        end
        subgraph "Persistent Storage"
            MD["memory/ 4 types\nuser / feedback / project / reference"]
            CM["CLAUDE.md\nProject rules"]
            MM["MEMORY.md\nIndex (max 200 lines / 25KB)"]
        end
    end
    SC --> AC
    UC --> AC
    AC --> CB
    AC --> PSH
    PSH --> AD
    AD -->|"extract/cleanup/strengthen"| MD
    MD --> MM
    CM --> UC
    MM --> UC

Part 2: My 3-Layer Memory System

The Problem

I have been using Claude Code as my primary development tool for six months. I run seven or more projects simultaneously — a SaaS product, a technical blog, open-source tools, consulting engagements. The core problems:

  1. Session amnesia: Yesterday’s decisions need re-explaining today
  2. Project isolation: Experience from Project A never transfers to Project B
  3. State tracking gaps: “What was I working on?” requires manual reconstruction every session
  4. Information placement confusion: Does this belong in CLAUDE.md, STATUS.md, or memory/?

In the Korean AI community, where Claude Code adoption has been particularly intense among solo builders and consultants, these problems are compounded by the sheer pace of project switching. The tooling ecosystem is evolving weekly, and losing track of context across sessions means losing competitive advantage.

Layer 1: Document Tier (Per-Project)

project-root/
├── CLAUDE.md      # Rules only (coding conventions, safety rules, env)
├── STATUS.md      # Current state (in-progress, blockers, next steps)
└── docs/          # Architecture, decisions (Tree tier only)
    └── architecture.md

CLAUDE.md contains rules only. “Use TypeScript”, “Never force push”, “Show SQL before schema changes.” No current state, no roadmap.

STATUS.md contains current state. What is in progress, what is blocked, what comes next. Updated at every session end.

Projects graduate through three documentation tiers:

TierConditionRequired DocumentsGraduation Trigger
SeedInitial idea in _ideas/README.md onlyFirst meaningful commit
SaplingFirst deploy, real usageCLAUDE.md + STATUS.mdMulti-component, revenue connection
TreeLong-term operation+ docs/architecture.md

This graduation path naturally scales documentation with project maturity. An early idea project has only a README (Seed). A technical blog with deployments is a Sapling. A revenue-generating SaaS product is a Tree.

Layer 2: Memory Index (Cross-Project)

~/.claude/projects/{workspace}/memory/
├── MEMORY.md              # Router: links to all files + project table
├── project-a-status.md    # Per-project status files
├── project-b-status.md
├── project-c-status.md
├── doc-standards.md        # Documentation standards
├── claude-md-design.md     # CLAUDE.md design principles
├── decision-frameworks.md  # Decision-making frameworks
├── skills-guide.md         # /wrap, /dashboard skill specs
├── feedback-*.md           # Working style feedback
└── ... (16+ files)

MEMORY.md is the router. It contains links to every memory file plus a project status table. The critical difference from Claude Code’s memory: mine is cross-project while CC’s is per-project isolated.

My memory types extend beyond Claude Code’s four:

TypeCC EquivalentMy Addition
useruserSame (role, preferences, working style)
feedbackfeedbackSame (corrections + reasoning)
project statusprojectSplit into per-project files
standards(none)Documentation standards, design principles
patterns(none)Technical patterns (Next.js static, content scaling)
referencesreferenceSame

Where Layers 1 and 2 are structured text-based memory, Layer 3 adds vector-based semantic search.

Backend architecture:

  • Mem0 OSS: Memory management framework
  • Qdrant: Local vector DB (embedding storage and retrieval)
  • SQLite: History and session journal storage

Six MCP tools:

ToolPurposeWhen Used
search_memoryVector search over memory/ filesSession start, context exploration
log_sessionWrite session journal (summary, decisions, unfinished, related)Session end (/wrap)
extract_factsAuto-extract facts from sessionSession end (/wrap)
check_staleDetect N-day inactive itemsSession start (/dashboard)
validate_placementVerify where info should be storedWhen saving information
index_markdownIndex memory/ files into vector DBAfter memory file updates

Two-tier data model:

  • Tier 1 (Confirmed): Markdown-based, human-curated, memory/*.md files
  • Tier 2 (Supplementary): Mem0 auto-managed, vector DB stored, auxiliary reference

To date: over a hundred past sessions backfill-indexed, dozens of facts extracted from existing memory files into the vector DB.

Session Lifecycle and Custom Skills

flowchart TB
    subgraph "My 3-Layer Memory System"
        direction TB
        subgraph "/dashboard (Session Start)"
            D1["STATUS.md + last 10 git commits"]
            D2["search_memory(project_name)"]
            D3["check_stale(14 days)"]
            D4["Generate briefing"]
        end
        subgraph "During Session"
            W1["Reference memory/ files"]
            W2["Update memory files as needed"]
        end
        subgraph "/wrap (Session End)"
            E1["Update STATUS.md"]
            E2["Update memory/ status files"]
            E3["log_session() - journal entry"]
            E4["extract_facts() - fact extraction"]
            E5["Update Doc Sync dates"]
        end
    end
    D1 --> D2 --> D3 --> D4
    D4 --> W1 --> W2
    W2 --> E1 --> E2 --> E3 --> E4 --> E5

/dashboard starts a session: reads STATUS.md and recent git log, searches Memory Hub for related context, detects 14-day inactive items, and generates a briefing.

/wrap ends a session. In Full mode, four parallel agents execute simultaneously — STATUS.md update, memory file update, session journal logging, fact extraction. This parallel execution is structurally similar to Claude Code’s Coordinator Mode (leader-worker pattern).

/monthly runs a monthly review: check_stale(30) across all projects for health assessment.


Part 3: Structural Comparison

This is the centerpiece. Line by line, where the two systems correspond and where they diverge.

My SystemCC InternalRole
CLAUDE.md (rules)CLAUDE.md (same concept)Per-project permanent rules
STATUS.md (state)Session History (conversation array)Current work state tracking
memory/*.md (index)memory/ (4 types: user/feedback/project/reference)Persistent memory across sessions
MEMORY.md (router)MEMORY.md (index, max 200 lines)Memory file navigation
Memory Hub semantic searchAuto-Dream (memory consolidation engine)Automated memory processing
/wrap (session end)Post-Sampling Hooks (auto-compact, dream)Session-end cleanup
/dashboard (session start)Context Injection (getSystemContext + getUserContext)Session-start context loading
check_stale(14 days)KAIROS (proactive monitoring, 5-min cron)Stale information detection
validate_placement()(Not in CC — my differentiator)Information placement validation
extract_facts()Auto-Dream extract (similar goal)Fact extraction from sessions
log_session()Session JSONL files (auto-saved)Session journal recording
3-tier docs (Seed/Sapling/Tree)(no equivalent — CC is a single product)Documentation scaling with maturity
/wrap 4-agent parallelCoordinator Mode (leader-worker pattern)Parallel task execution
flowchart LR
    subgraph "My System"
        direction TB
        MC["CLAUDE.md\nRules"]
        MS["STATUS.md\nState"]
        MM["memory/*.md\nPersistent memory (16+)"]
        MH["Memory Hub MCP\nMem0 + Qdrant\nSemantic search"]
        MW["/wrap + /dashboard\nSession lifecycle\n4-agent parallel"]
        MV["validate_placement()\nPlacement validation"]
    end
    subgraph "Claude Code Internal"
        direction TB
        CC["CLAUDE.md\nRules"]
        CH["Session History\nConversation array"]
        CM["memory/ 4 types\nuser/feedback/project/ref"]
        CA["Auto-Dream\nSub-agent consolidation\nKAIROS integration"]
        CP["Context Injection\n+ Post-Sampling Hooks"]
        CX["(not present)"]
    end
    MC <-.->|"identical"| CC
    MS <-.->|"analogous"| CH
    MM <-.->|"structural match"| CM
    MH <-.->|"functional match"| CA
    MW <-.->|"role match"| CP
    MV <-.->|"differentiator"| CX

Part 4: What I Got Right, What CC Does Better

What I Got Right

The 3-layer approach: documents + index + semantic

Text files alone are limited (keyword-only search). A vector DB alone has no structure (no categorization). Claude Code reached the same conclusion: CLAUDE.md (rules) + memory/ (index) + Auto-Dream (automation).

Session lifecycle hooks

“Load context at session start, persist state at session end” is a foundational pattern for agent memory. My /dashboard and /wrap map directly to CC’s Context Injection and Post-Sampling Hooks.

Stale detection

Outdated information polluting current context is a core memory system problem. My check_stale(14) was solving the same problem as KAIROS’s periodic monitoring.

Explicit memory types with structured frontmatter

Each memory file declares name, description, and type in frontmatter. The same principle as CC’s four-type classification — structured metadata enables search and management.

Defining what NOT to save

Not storing code patterns, architecture details, or git history in memory. Nearly identical to CC’s “What NOT to save” list. Good memory systems are defined by their exclusion criteria, not their inclusion criteria.

What CC Does Better

Circuit Breaker

MAX_CONSECUTIVE_AUTOCOMPACT_FAILURES = 3 in Auto-Compact. My system has no failure isolation. If Memory Hub search fails, it just fails. Three lines of code preventing 250,000 wasted API calls per day. This pattern needs immediate adoption.

Sub-agent summarization

Both Auto-Compact and Auto-Dream use forked sub-agents — independent contexts that do not consume the main session’s context window. My /wrap uses four parallel agents but does not fork for independent summarization. After long sessions, context is already tight when cleanup work begins.

14 cache-break vector management

Claude Code aggressively uses prompt caching. System prompt, tool definitions, memory — if any of these change, the cache breaks and costs spike. The source identifies 14 cache-break vectors with stabilization strategies for each. My system has no cache management at this level.

Prompt caching integration

getSystemContext() and getUserContext() are memoized per session and integrated with the API’s 1-hour prompt cache. My MCP setup has no equivalent caching layer.

Immutable state

Zustand + DeepImmutable types prevent state mutation at compile time. My system is file-based and inherently mutable — if two sessions simultaneously modify the same memory file, conflicts can occur.

My Advantages (Things CC Does Not Have)

validate_placement() — Information placement validation

validate_placement("Decided to use Next.js instead of Astro 5", "CLAUDE.md")
-> "Inappropriate for CLAUDE.md. Decision records belong in STATUS.md or docs/decisions/"

Claude Code classifies memory into four types, but which specific file within a type is left to model judgment. My system validates this explicitly. As the memory system grows, this function’s value compounds.

Explicit session journal

Claude Code’s Auto-Dream extracts memory automatically. My system explicitly records decisions made, incomplete items, and related projects. Incomplete items in particular are difficult to detect reliably through automatic extraction.

Cross-project stale detection

check_stale(14) runs across all projects. Claude Code’s memory is per-project isolated, so while working on Project B, you cannot know that Project A has been neglected for two weeks. My system surfaces this.

3-tier documentation graduation (Seed to Sapling to Tree)

Documentation scales with project maturity. Claude Code has no need for this (it is a single product), but when operating seven or more simultaneous projects, graduated documentation is essential to avoid over-engineering young ideas or under-documenting mature systems.

Semantic search (vector-based)

Claude Code’s memory is file-scan based — it reads MEMORY.md, follows links, reads files. My system uses Qdrant vector DB to search “past experiences similar to this problem” semantically. Auto-Dream may close this gap in the future, but the current source shows file-based processing, not vector search.


Why This Convergence Is Not Coincidence

Two systems built independently — one by Anthropic’s engineering team inside a production CLI serving millions, one by a solo AI consultant iterating through months of daily use — arrived at strikingly similar architectures.

This points to convergent constraints: the problem space of persistent agent memory is narrow enough that independent implementations converge.

graph TD
    P1["Problem: LLMs have no persistent memory"] --> S1["Solution: File-based persistent storage"]
    P2["Problem: Loading everything every time is cost-prohibitive"] --> S2["Solution: Index + detail two-stage approach"]
    P3["Problem: Mixing rules and state causes staleness"] --> S3["Solution: Hierarchical separation"]
    P4["Problem: Mid-session cleanup breaks flow"] --> S4["Solution: Process at session boundaries"]
    P5["Problem: Storing everything degrades search quality"] --> S5["Solution: Explicit exclusion criteria"]

    S1 --> C["Convergence: Same architecture"]
    S2 --> C
    S3 --> C
    S4 --> C
    S5 --> C

Both systems start from these five constraints and arrive at the same architecture.

But the divergence is equally instructive:

DimensionMy System Optimized ForCC Optimized For
ReliabilityHuman oversight (explicit commands)Automation (no human dependency)
Placement accuracyValidation (validate_placement)Model judgment (no validation)
Cross-project visibilityGlobal router (MEMORY.md)Per-project isolation
Context managementTrust the modelActive circuit breaking
Memory cleanupOn-demand (check_stale)Automatic (retention policies)

I optimized for accuracy and control. Claude Code optimized for reliability and automation. Neither is universally better.

The ideal system would combine both: automatic lifecycle hooks with explicit placement validation. Automatic memory consolidation with human-auditable session journals. Per-project isolation with cross-project semantic routing.

Neither system is complete. Claude Code lacks placement validation and cross-project awareness. My system lacks circuit breakers and cache optimization. The future is a synthesis — and the convergence itself is the strongest validation that both approaches are on the right track.


Sources & Limitations

This series synthesizes the following publicly available analyses and does not directly contain leaked source code.

SourceURLFocus
ccunpacked.devccunpacked.devVisual architecture guide, tool/command catalog
Wikidocs Analysiswikidocs.net/338204Detailed technical analysis (execution flow, state, rendering)
PyTorch KRdiscuss.pytorch.krCommunity analysis + HN discussion synthesis
Claw Codegithub.com/ultraworkers/claw-codeClean-room reimplementation (Rust/Python), PARITY.md gap analysis

Analysis date: April 2, 2026. Anthropic issued DMCA takedowns on 8,100+ forks and discontinued npm distribution shortly after the leak, so some sources may have changed accessibility. Features behind feature flags are unreleased and may be modified or deprecated before launch.

Share

Related Posts