How do you implement a Ralph Loop?

The minimum implementation is a one-liner: `while true; do cat task.md | claude -p; done`. Production hardening adds (1) a Stop Hook for automatic completion detection, (2) file-based state (task.md, progress.md) for persistence, and (3) Cross-Model Review (Worker = Claude, Reviewer = GPT or Gemini) for a verification layer.

What does a minimum Ralph Loop look like in code?

A 175-line Bash reference implementation is included in the post. The core is a five-step while-loop: (a) load task.md, (b) call Claude CLI, (c) append result to progress.md, (d) check completion condition, (e) on incomplete, discard context and restart.

What is Cross-Model Review?

A pattern in which a worker model (e.g., Claude Sonnet) produces output and a separate reviewer model (e.g., GPT-4 or Gemini) audits the same output. It breaks intra-model confirmation bias and combines model-specific strengths to push past the ceiling of any single-model loop.

How does the Stop Hook combine with a Ralph Loop?

Stop Hooks are Claude Code's exit-condition verification mechanism. Used inside a Ralph Loop, they automatically judge whether the task is complete; if the hook signals incomplete, the next iteration fires automatically, enabling long-running runs without human intervention.

Beyond coding, where does the Ralph Loop apply?

This guide covers three production applications: (1) autonomous coding (the original use case), (2) prompt refinement — iteratively improving a prompt for a single task, and (3) test expansion — autogenerating and running edge-case tests against a codebase. The core pattern is identical.

The Ralph Loop Implementation Guide — From a Bash One-Liner to Cross-Model Review

The previous post covered the evolutionary arc from RLHF to context collapse — the structural limits that every long-running AI agent loop eventually hits. This post goes deeper into how the Ralph Loop solves those problems, at the implementation level.

Expect code examples, file structures, architecture diagrams, and practical applications that extend well beyond coding into prompt refinement and quality engineering.

The Minimal Implementation: while true + task.md

The core of the Ralph Loop is disarmingly simple.

#!/bin/bash
while true; do
  cat task.md | claude -p
done

Three lines. That is the entire thing. But behind this simplicity sits a deliberate design philosophy.

What Each Line Does

Code	Role
`while true`	Infinite loop. Restarts the agent immediately upon termination
`cat task.md`	Pipes the original task specification into stdin on every iteration
`claude -p`	Non-interactive mode. Executes the prompt and exits automatically

Every time a new iteration starts, the previous session’s conversation history is completely destroyed. Think of it like memory allocation (malloc) for LLMs — the model reconstructs only the information it needs by reading files from disk. Faulty reasoning from prior attempts, broken code patterns, irrelevant error logs — all of it evaporates when the session ends. Only real, tangible progress in the codebase survives.

Stop Hook: Preventing the Agent from Escaping

The basic while loop has one flaw. When the LLM declares “task complete” and exits, the loop restarts the same task from scratch. You end up re-running finished work.

The Stop Hook intercepts the agent’s termination attempt and re-injects the original task specification.

# Claude Code stop hook example (.claude/hooks/stop.sh)
#!/bin/bash
# Runs when the agent attempts to exit
if ! grep -q "ALL_TASKS_COMPLETE" ralph/progress.md; then
  echo "Incomplete tasks remain. Re-read task.md."
  exit 1  # Block termination → agent continues running
fi

With this in place, the agent cannot exit until every task is genuinely done. It is a structural safeguard against success bias — the tendency of LLMs to declare victory prematurely.

The File System as Long-Term Memory

If the conversation history is wiped on every iteration, the agent loses track of where it is. The Ralph Loop solves this by treating the local file system as the model’s persistent memory. Files replace the context window as the source of continuity.

State File Structure

project/
├── task.md              ← Original PRD. Immutable source of truth
├── .ralph/
│   ├── iteration.txt    ← Current loop count
│   ├── work-summary.txt ← What the Worker did this iteration
│   ├── feedback.txt     ← Reviewer's error notes and revision directives
│   └── progress.md      ← Checklist-format progress tracker
├── src/                 ← Actual code (managed by Git)
└── tests/               ← Test code (the success/failure oracle)

File	Created by	Read by	Purpose
`task.md`	Human	Worker (every loop)	Immutable task spec. Starting point for every iteration
`iteration.txt`	System	Worker	Tracks which loop number the system is on
`work-summary.txt`	Worker	Reviewer	Summarizes what changed this iteration
`feedback.txt`	Reviewer	Next Worker	Previous failure analysis + revision direction
`progress.md`	Worker	Worker + Stop Hook	Checklist. Tracks incomplete items

The governing principle: The LLM’s context window is volatile working memory. The file system is persistent long-term memory.

Cross-Model Review Architecture

Generating code in an infinite loop without oversight can destroy a system. Modern Ralph Loop implementations introduce a cross-model review architecture that physically separates the Worker and the Reviewer.

sequenceDiagram
    participant Loop as Bash Loop
    participant W as Worker (Claude Sonnet)
    participant R as Reviewer (GPT-4o)
    participant FS as File System

    Loop->>FS: cat task.md
    FS->>W: task.md + feedback.txt
    W->>FS: Code changes + work-summary.txt
    W->>Loop: Exit
    Loop->>FS: cat work-summary.txt
    FS->>R: work-summary.txt + changed code
    R->>R: Run tests + code review

    alt Pass
        R->>FS: feedback.txt = "SHIP"
        R->>Loop: Exit
        Loop->>Loop: Break loop
    else Fail
        R->>FS: feedback.txt = "REVISE: specific critique"
        R->>Loop: Exit
        Loop->>Loop: Next iteration
    end

Worker Recipe (ralph-work.yaml)

The Worker model (e.g., Claude Sonnet) executes the following sequence:

Read task.md for the full objective
Read feedback.txt for the Reviewer’s prior critique
Read progress.md to identify incomplete checklist items
Execute code changes
Write a change summary to work-summary.txt
Terminate the session

Reviewer Recipe (ralph-review.yaml)

An entirely different model (e.g., GPT-4o) boots with a clean context.

Inspect the Worker’s changes via git diff
Run the test suite
Review code quality

If all requirements are met → write SHIP to feedback.txt → loop exits. If defects are found → write REVISE + specific feedback to feedback.txt → next Worker iteration fires.

Why Use a Different Model?

When the same model generates and reviews, self-confirmation bias takes over. Claude reviewing Claude’s code is disproportionately likely to say “looks good.” Using a different model for the review step structurally breaks this feedback loop. The Reviewer has no memory of the generation process, no sunk-cost attachment, and a different set of internal biases — which is precisely what makes the cross-check effective.

Practical Example 1: CLI Tool Migration (Coding)

The most canonical Ralph Loop use case.

Scenario

A TypeScript-based CLI tool needs a full migration to Python. 42 files, 180 functions.

task.md

# Task: TypeScript → Python Migration

## Goal
Migrate the CLI tool from TypeScript to Python 3.12.
All existing tests must pass in the Python version.

## Checklist
- [ ] src/cli.ts → src/cli.py
- [ ] src/parser.ts → src/parser.py
- [ ] src/formatter.ts → src/formatter.py
- [ ] ... (42 files total)
- [ ] All unit tests pass
- [ ] All integration tests pass
- [ ] Type hints complete (mypy strict)

Execution Log

Iteration 1: cli.py, parser.py migrated. Tests 12/42 passing.
Iteration 2: formatter.py migrated. Feedback: "datetime handling mismatch." Fixed. Tests 20/42.
Iteration 3: Remaining modules migrated. Tests 35/42.
Iteration 4: Debug 7 failing tests. Tests 40/42.
Iteration 5: Fix 2 edge cases. Tests 42/42. SHIP.

Five iterations, each with a fresh context. Had this been attempted in a single session, context collapse would have started around iteration 3. By iteration 5, the model would be hallucinating fixes for bugs it introduced three turns ago.

The Ralph Loop is not coding-specific. It applies identically to iterative refinement of prompts, configurations, and workflows.

Scenario

A Slack bot’s daily news curation output is unsatisfactory. Insights are shallow, sources are biased toward a narrow set, and irrelevant articles keep leaking through.

File Structure

project/ralph/
├── PROMPT.md            ← Worker instructions
├── quality-spec.md      ← Quality rubric (= task.md equivalent)
├── sample-input.md      ← Test news data
├── iteration.txt        ← Loop count
├── feedback.md          ← Reviewer feedback
├── evaluation.md        ← (Worker-generated) scoring results
└── simulated-output.md  ← (Worker-generated) simulated curation

Per-Iteration Worker Sequence

Read feedback.md for the Reviewer’s prior notes
Read the current curation prompt (src/crons/daily-signal.ts)
Simulate output against sample-input.md news data
Self-score against quality-spec.md criteria (0-10 scale)
If score is below 8, revise only the 1-2 weakest areas of the prompt
Log changes to feedback.md

quality-spec.md Example (Excerpt)

This is the key differentiator. The quality specification file acts as both the success criterion and the scoring rubric, replacing traditional test suites for non-code refinement tasks.

## Scoring Criteria (0-10)

| Dimension | Weight | Criteria |
|---|---|---|
| Relevance | 30% | Ratio of AI strategy/methodology items out of 15 total |
| Insight depth | 25% | Quality of "why this matters" analysis |
| Source diversity | 15% | Coverage of Korean dev communities + international AI blogs |
| Noise removal | 15% | Count of irrelevant articles that leaked through |
| Headline specificity | 15% | Presence of numbers/names + analytical framing |

SHIP threshold: Total score 8.0 or above

The quality-spec.md pattern is worth internalizing. It does for prompt engineering what unit tests do for code — it gives the loop a machine-evaluable success criterion. Without it, the Reviewer has no objective basis for SHIP/REVISE decisions, and the loop degenerates into subjective cycling.

Execution Log

Iteration 1: Added "explain why this matters" instruction to insight prompt. Score 5.2 → 6.1
Iteration 2: Redesigned categories (added "method" category). Score 6.1 → 7.0
Iteration 3: Separated "Must-Read 3" with 2-sentence insight requirement. Score 7.0 → 7.8
Iteration 4: Strengthened EXCLUDE rules for noise filtering. Score 7.8 → 8.3. SHIP.

Zero lines of application code changed. Four iterations of prompt-only modification brought the quality score above threshold. This is the “Prompt Ralph Loop” — the same architecture, the same file-based state management, the same Worker-Reviewer separation, applied to a fundamentally different artifact type.

The implications are worth stating explicitly: any artifact that can be evaluated against a rubric can be Ralph Looped. Documentation quality, configuration tuning, system prompt optimization, translation accuracy — anywhere you can define a quality-spec.md, you can run the loop.

Practical Example 3: Automated Test Coverage Expansion

Scenario

An existing codebase sits at 40% test coverage. The target is 80% or above.

task.md

# Task: Test Coverage 40% → 80%

## Goal
Add unit tests to reach 80% line coverage.
Do not modify existing source code — tests only.

## Rules
- One test file per source file
- Use existing test patterns in tests/ directory
- Run: npm test -- --coverage after each change

## Progress tracking
Update this checklist after each file:
- [ ] src/auth.ts (0% → ?)
- [ ] src/api.ts (30% → ?)
- [ ] src/utils.ts (60% → ?)
- ... (20 files)

This task is ideal for the Ralph Loop. Each iteration is independent (writing tests for individual files), the success criterion is unambiguous (coverage percentage), and failures carry over gracefully — a test file that fails can be debugged in the next iteration without any context from the previous attempt.

The coverage number itself functions as a natural progress tracker. The Worker can read npm test -- --coverage output at the start of each loop and immediately identify which files still need attention. No separate progress parsing required.

The Open-Source Ecosystem

The Ralph Loop concept has rapidly spawned an ecosystem of implementations and variations.

Project	Description	Key Feature
snarktank/ralph	Community main implementation (10K+ stars)	Bash-based, supports multiple AI CLIs
vercel-labs/ralph-loop-agent	Vercel Labs AI SDK integration	Distributed as npm package
PageAI-Pro/ralph-loop	Docker-sandboxed production implementation	Safe execution in isolated environments
ClaytonFarr/ralph-playbook	Methodology guide	Standardized Worker-Reviewer recipes
mikeyobrien/ralph-orchestrator	Rust-based orchestrator	Supports 7 AI backends
Anthropic `ralph-wiggum`	Official Claude Code plugin	Endorsed by Boris Cherny (lead engineer)

Variant patterns are also proliferating:

RALPHA: Recursive Author Loop for Cursor
Ralph Mode: Built-in mode within Deep Agents
LangChain adapters and multi-CLI integration layers

Explosive Adoption in the Korean Ecosystem

The Korean developer community’s reception of the Ralph Loop deserves its own section — not as a regional footnote, but because Korea produced several firsts that shaped how the pattern is practiced globally.

Developer Slang

geu jagop geunyang ralph dollyeonwa (그 작업 그냥 랄프 돌려놔) = “Just ralph-loop it overnight”

The phrase has become shorthand in Korean dev circles for delegating any repeatable task to an autonomous AI loop. It carries the same casual authority as “just ship it” — the assumption being that if the task spec is clear, the Ralph Loop will converge.

Major Korean Coverage

The pattern was covered extensively across Korean technical media: WikiDocs (Jaehong Park’s Silicon Valley Blog), AI Times, PyTorch Korea, Inflearn (video tutorials), Dale Seo’s engineering blog, GeekNews (news.hada.io), and TILNOTE. Coverage ranged from beginner walkthroughs to production architecture deep-dives.

Alibaba Cloud also published “From ReAct to Ralph Loop: A Continuous Iteration Paradigm for AI Agents” — a technical deep-dive signaling enterprise-level interest in the pattern.

The Ralphathon (March 2026)

Korea hosted the world’s first hackathon built entirely around the Ralph Loop.

Organizers: Team Attention + Kakao Ventures, sponsored by OpenAI
Format: Humans design specs only. AI codes overnight.
Winning team’s output: AI wrote 100,000 LOC — 70% of which was test code. Human keyboard input: zero.

The 70% test ratio is the most telling detail. In the Ralph Loop’s Worker-Reviewer architecture, tests function as the success criterion. The agent naturally writes extensive tests because that is what the Reviewer uses to decide SHIP vs. REVISE. More tests mean a faster path to SHIP. The architecture incentivizes test coverage by design, not by mandate.

Implementation Checklist

A preflight checklist for actually setting up a Ralph Loop.

Required

Clear task.md: The agent must be able to determine “what to do” instantly at the start of each loop
Machine-evaluable success criteria: Test pass/fail, coverage numbers, build success — anything an automated check can judge
Git initialized: Every iteration’s changes must be tracked as commits

Stop Hook configured: Prevents premature agent termination
Cross-Model Review: Separate Worker and Reviewer onto different models
Iteration cap: Prevent true infinite loops (5-10 iterations is a reasonable ceiling)
CLAUDE.md / .cursorrules: Project conventions specified in files the agent reads on startup

Anti-Patterns

Running without tests → infinite flailing (no way to judge success)
Vague task.md → each loop diverges in a different direction
Modifying source code without running tests → the Worker declares victory based on vibes

graph TD
    A{"Tests exist?"} -->|Yes| B{"task.md is clear?"}
    A -->|No| X["Stop. Write tests first."]
    B -->|Yes| C{"Success criteria automatable?"}
    B -->|No| Y["Stop. Make task.md concrete."]
    C -->|Yes| D["Ready to Ralph Loop."]
    C -->|No| Z["Partial automation. Manual review required."]

Key Takeaways

The technical essence of the Ralph Loop, in one sentence:

An infinite iteration architecture that treats the context window as volatile working memory, the file system as persistent long-term memory, and Git as an audit trail.

This simple structure addresses context collapse at a fundamental level. It applies not only to coding but equally to prompt refinement, configuration tuning, documentation quality improvement, and any iterative refinement workflow where progress can be evaluated against a specification.

The next post looks beyond the Ralph Loop at self-evolving agent systems — where agents update their own weights — and the shifting role of the AI-era developer.

Previous: The Evolution of AI Agent Loops — From RLHF to the Ralph Loop

Next: Beyond the Ralph Loop — Self-Evolving Agents and the Changing Role of the AI Developer

The Minimal Implementation: while true + task.md

What Each Line Does

Stop Hook: Preventing the Agent from Escaping

The File System as Long-Term Memory

State File Structure

Cross-Model Review Architecture

Worker Recipe (ralph-work.yaml)

Reviewer Recipe (ralph-review.yaml)

Why Use a Different Model?

Practical Example 1: CLI Tool Migration (Coding)

Scenario

task.md

Execution Log

Practical Example 2: Prompt Quality Refinement (Non-Coding)

Scenario

File Structure

Per-Iteration Worker Sequence

quality-spec.md Example (Excerpt)

Execution Log

Practical Example 3: Automated Test Coverage Expansion

Scenario

task.md

The Open-Source Ecosystem

Explosive Adoption in the Korean Ecosystem

Developer Slang

Major Korean Coverage

The Ralphathon (March 2026)

Implementation Checklist

Required

Recommended

Anti-Patterns

Key Takeaways

Frequently Asked Questions

How do you implement a Ralph Loop?

What does a minimum Ralph Loop look like in code?

What is Cross-Model Review?

How does the Stop Hook combine with a Ralph Loop?

Beyond coding, where does the Ralph Loop apply?

Related Posts

The Evolution of AI Agent Loops — From RLHF to Ralph Loop

What the Claude Code Leak Revealed: Anatomy of an AI Agent

Beyond Ralph Loop — Self-Evolving Agents and the Shifting Role of AI Developers