Documentation of WICHI's 9-Bucket query framework. Defines 3 Zones and 9 Buckets based on brand presence to measure AI's organic recommendations effectively.
The Problem: Scores Are Meaningless Without Query Classification
The starting point of GEO (Generative Engine Optimization) analysis is the query. Brand exposure results change completely depending on what question you ask the AI search engine.
In WICHI’s early days, all queries were processed as a single undifferentiated group. “Samsung Card annual fee” and “recommend a low annual fee card” were analyzed with the same criteria. The former is a search where the user already knows the brand. The latter is a category-level exploration where the user has no specific brand in mind. Samsung Card appearing in each of these queries means entirely different things.
Without this distinction, the GEO Score was uninterpretable. When the score was high, there was no way to tell whether it was genuinely good, inflated by branded queries, or actually performing well on competitive queries. Three specific problems emerged.
First, performance distortion. Queries containing the brand name (e.g., “Samsung Card annual fee”) will naturally mention that brand — the user already specified it. When these queries inflate the overall score, the GEO Score can look strong even when AI never voluntarily recommends the brand.
Second, no actionable connection. When the score was low, there was no way to know what to fix. Is brand awareness the problem? Is AI failing to recommend the brand in category queries? Is the brand absent from competitive contexts? Without classification, these scenarios were indistinguishable.
Third, misclassification of competitor queries. Samsung Card being mentioned in a “Hyundai Card drawbacks” query is fundamentally different from it appearing in “recommend a low annual fee card.” The former captures competitor churn demand; the latter represents AI’s organic recommendation. Mixing these in the same bucket makes score interpretation ambiguous.
A GEO Score without query classification is like a school-wide average grade. Without distinguishing the student who excels in math from the one who excels in English, you cannot determine which subject needs reinforcement.
A systematic query classification framework was needed to solve this.
Choosing the Classification Axis: Brand Presence, Not Intent
The initial approach was to adopt a familiar taxonomy from SEO: search intent classification.
| Intent | Description | Example |
|---|---|---|
| Informational | Information seeking | ”What are credit card annual fees?” |
| Transactional | Purchase/action intent | ”Apply for Samsung Card online” |
| Navigational | Navigate to specific site | ”Samsung Card homepage” |
| Commercial Investigation | Pre-purchase comparison | ”Compare low annual fee cards” |
This classification works well for traditional SEO. But for AI search, its limitations are clear.
The core question in AI search is: “When the user does not specify a brand, which brand does AI recommend?” Intent-based classification cannot capture this. Whether the intent is Informational or Transactional, whether or not the query contains a brand name is a far more fundamental axis.
For example, the following two queries are both classified as Commercial Investigation:
- “How much is Samsung Card’s annual fee?”
- “Which cards have low annual fees?”
From a GEO perspective, these queries have completely different natures. AI mentioning Samsung Card in the first is expected. AI mentioning Samsung Card in the second — that is genuine GEO performance.
graph LR
A[Query Classification Axis] --> B{SEO Intent-Based?}
B -->|Informational| C[Cannot distinguish<br/>brand presence]
B -->|Transactional| C
B -->|Navigational| C
A --> D{Brand Presence?}
D -->|Brand included| E[Owned Zone<br/>Baseline measurement]
D -->|No brand| F[Battleground Zone<br/>Core KPI]
D -->|Competitor included| G[Competitive Zone<br/>Competitive positioning]
style D fill:#2563eb,color:#fff
style B fill:#6b7280,color:#fff
The conclusion: the primary classification axis for GEO analysis is brand presence in the query. Intent classification is retained as secondary metadata, while the primary structure is built around brand relationship.
The 3-Zone Structure: Owned, Battleground, Competitive
Queries are divided into three zones based on brand presence.
| Zone | Definition | What It Measures | Query Examples (Credit Cards) |
|---|---|---|---|
| Owned | Query contains the target brand name | Brand awareness-based exposure, AI’s accuracy on brand information | ”Samsung Card annual fee,” “Samsung Card international payments” |
| Battleground | No brand name, category/needs-level search | Whether AI voluntarily recommends the brand (core KPI) | “Recommend a low annual fee card,” “First card for a new graduate” |
| Competitive | Competitor brand explicitly mentioned | Presence in competitor contexts, capturing churn demand | ”Hyundai Card drawbacks,” “Shinhan Card vs Samsung Card” |
This 3-zone distinction is the framework’s backbone. Each zone measures something different, and scores carry different meanings.
What Each Zone’s Score Means
High Owned Zone score: AI accurately knows the brand’s basic information. This is a necessary condition, not a sufficient one. This score alone cannot judge GEO performance.
High Battleground Zone score: AI voluntarily recommends the brand even when the user did not specify it. This is the essence of GEO and the area with real business value.
High Competitive Zone score: The brand appears as an alternative even in contexts where competitors are mentioned. Exposure in competitor complaint queries represents direct customer conversion opportunities.
The Owned Zone is a “baseline.” Answering when someone calls your name is table stakes. A high score in the Battleground Zone means you have secured meaningful visibility in the age of AI search.
Why It Was Not 3 Zones From the Start
Initially, only two zones were used: Owned and Battleground. Competitor queries were placed in Battleground, which made scores ambiguous.
Samsung Card being mentioned in a “Hyundai Card drawbacks” query is different from Battleground’s organic recommendation. It captures competitor churn demand — not a case of AI spontaneously recommending the brand at the category level. Without separating these, the Battleground score’s meaning gets diluted.
Additionally, queries like “Samsung Card vs Hyundai Card” did not fit cleanly into either zone. They contain a brand name (so not Battleground), but a competitor is present too (so not purely Owned). A separate zone was needed.
graph TD
A[Query Input] --> B{Contains target<br/>brand name?}
B -->|Yes| C{Also contains<br/>competitor brand?}
B -->|No| D{Contains<br/>competitor brand?}
C -->|Yes| E[Competitive Zone<br/>Head-to-Head]
C -->|No| F[Owned Zone]
D -->|Yes| G[Competitive Zone<br/>Competitor Pain]
D -->|No| H[Battleground Zone]
style H fill:#2563eb,color:#fff
style F fill:#059669,color:#fff
style E fill:#dc2626,color:#fff
style G fill:#dc2626,color:#fff
Detailed Design of 9 Buckets
The 3 zones are further subdivided into 9 total buckets. Each bucket has a unique measurement purpose and design rationale.
The 3x3 Matrix Structure
graph TB
subgraph Owned ["Owned Zone (Brand Included)"]
A["A: Branded Direct<br/>Feature/spec questions"]
B_bucket["B: Branded Contextual<br/>Situational context questions"]
end
subgraph Battleground ["Battleground Zone (No Brand)"]
C["C: Category Generic<br/>Category exploration"]
D["D: Persona Entry<br/>Persona entry point"]
E["E: Persona Premium<br/>High-value persona"]
H["H: Adjacent Topic<br/>Related topics"]
I["I: Trend / Ranking<br/>Trends and rankings"]
end
subgraph Competitive ["Competitive Zone (Competitor Included)"]
F["F: Head-to-Head<br/>Direct comparison"]
G["G: Competitor Pain<br/>Competitor complaints"]
end
style Battleground fill:#1e3a5f,color:#fff
style Owned fill:#1a4731,color:#fff
style Competitive fill:#5c1a1a,color:#fff
Owned Zone: Baseline Measurement
The Owned Zone measures how AI describes a brand when the brand name is already in the query. If scores here are low, AI does not even have accurate basic information about the brand — a problem that must be resolved before examining other zones.
| Bucket | Name | Measurement Purpose | Query Pattern | Example (SaaS) |
|---|---|---|---|---|
| A | Branded Direct | AI’s factual accuracy on brand features/specs | Brand name + specific feature question | ”What are Notion AI’s pricing plans?” |
| B | Branded Contextual | AI’s contextual understanding in a specific scenario | Brand name + usage scenario | ”My team is 10 people — would Notion work for project management?” |
Branded Direct vs. Branded Contextual: Direct is fact-checking in nature — “How much does it cost?”, “Does it have this feature?” Contextual involves contextual reasoning — “Would this product fit my situation?” AI may know the facts accurately but fail at contextual judgment, or vice versa. Separating these clarifies the improvement direction.
Battleground Zone: The Core KPI
The Battleground Zone is the framework’s heart. It measures whether AI voluntarily recommends a brand when the user has not specified one. It is subdivided into 5 buckets because even within Battleground, query characteristics and conversion potential vary widely.
| Bucket | Name | Measurement Purpose | Query Pattern | Example (SaaS) |
|---|---|---|---|---|
| C | Category Generic | AI recommendation in general category exploration | Category keyword + compound conditions | ”I need a project management tool that’s free and supports both kanban and Gantt charts” |
| D | Persona Entry | Recommendation for a specific persona’s entry query | Persona + category question | ”I’m on a 5-person startup team — what collaboration tool should we start with?” |
| E | Persona Premium | Recommendation for high-involvement, high-value personas | Expert/high-value user + professional needs | ”I need enterprise-grade security — any project management tools with SOC2 certification?” |
| H | Adjacent Topic | Exposure in related but not direct-category topics | Adjacent topic + indirect connection | ”How should remote teams handle async communication? Task tracking keeps falling through” |
| I | Trend / Ranking | Positioning in ranking and trend queries | Time reference + ranking/trend | ”What are the top project management tools for startups in 2026?” |
Design rationale for separating each bucket:
| Distinction | Why a Separate Bucket |
|---|---|
| C vs D | Category Generic is “what exists” exploration; Persona Entry is “what fits my situation.” AI recommendation logic operates differently for each. |
| D vs E | Entry personas are in early exploration; Premium personas have specific, concrete requirements. Business value (LTV) differs, requiring separate measurement. |
| H exists because | Adjacent topic mentions represent new acquisition channels. Notion appearing in “remote work tips” is a different opportunity than direct category queries. |
| I exists because | ”Best X of 2026” queries cause AI to generate ordered lists. Position within the list matters, and measurement methodology differs from other buckets. |
Competitive Zone: Competitive Positioning
The Competitive Zone measures brand presence in contexts where competitors are directly mentioned.
| Bucket | Name | Measurement Purpose | Query Pattern | Example (SaaS) |
|---|---|---|---|---|
| F | Head-to-Head | Which side AI favors in direct comparison | Brand vs. competitor | ”Between Notion and Monday.com, which is better for startups?” |
| G | Competitor Pain | Whether the brand appears as an alternative to competitor complaints | Competitor + dissatisfaction/churn scenario | ”Monday.com is too expensive — any similar tools that are more affordable?” |
Head-to-Head vs. Competitor Pain: Head-to-Head is a neutral comparison — the user knows both brands and wants an objective assessment. Competitor Pain has directionality — the user is dissatisfied with a competitor and actively seeking alternatives. Appearing in the latter represents a direct switching opportunity.
Core Design Principles
Principle 1: Brand Name Exclusion
This is the most important design principle in the entire framework.
Battleground Zone queries must never contain the target brand name.
The rationale is simple. The essence of GEO is measuring “whether AI voluntarily recommends the brand when the user has not specified it.”
If AI mentions Samsung Card when asked “Recommend a card with low annual fees” — that is genuine GEO performance. Including “Samsung Card” in the query and then checking whether AI mentions it is not measurement; it is self-confirmation.
This principle applies with modifications to the Competitive Zone.
| Zone | Brand Name Rule |
|---|---|
| Owned (A, B) | Target brand name must be included. Competitor names not allowed. |
| Battleground (C, D, E, H, I) | Target brand name never included. Competitor names not included either. |
| Competitive F (Head-to-Head) | Target brand name + competitor name both included. |
| Competitive G (Competitor Pain) | Competitor name only. Target brand name never included. |
Excluding the target brand from Competitor Pain (G) follows the same logic as Battleground. Asking “Monday.com is too expensive — how about Notion?” forces AI to mention Notion. Asking “Monday.com is too expensive — any similar but more affordable tools?” and having AI voluntarily recommend Notion — that is meaningful measurement.
Principle 2: Queries Are Conversations, Not Keywords
AI search queries are fundamentally different from traditional search engine queries.
| Aspect | Traditional Search Engine | AI Search |
|---|---|---|
| Format | Keyword strings | Natural language conversations |
| Example | ”credit card annual fee comparison" | "I’m a new graduate and I want a card with low annual fees and low international transaction fees” |
| Condition count | Typically 1-2 | 2-3+ compound conditions |
| Tone | None | Casual/formal variations |
Using keyword-style queries like “credit card recommendation” fails to represent actual AI search user behavior. Real users describe their situations, present multiple conditions simultaneously, and ask in conversational language. The query generation stage must reflect this difference.
Principle 3: Complexity Distribution Control
Query complexity (number of conditions) is intentionally managed in its distribution.
| Complexity | Description | Target Share | Allowed Buckets |
|---|---|---|---|
| Simple (1 condition) | Single-condition questions | Minority | Primarily Trend/Ranking (I) |
| Medium (2 conditions) | Dual-condition questions | Moderate | All |
| Complex (3+ conditions) | Multi-condition questions | Largest share | All |
Testing with only simple queries measures only AI’s default recommendation list. Whether AI recommends a brand under compound conditions is a more rigorous test and more closely mirrors actual user behavior.
Query Expansion Pipeline
The 9-Bucket framework is a classification system only. The actual queries used for analysis are generated through a separate expansion pipeline.
Pipeline Structure
graph TD
A[User Input<br/>Brand name + Product name] --> B[Step 0: Auto-generate Brand Info<br/>Category, competitors, USP, etc.]
B --> C[Step 2: Signal Extraction<br/>Autocomplete + Topics + Personas]
C --> D[Step 3: Query Expansion<br/>Generate with 9-Bucket distribution]
D --> E[Step 3b: Metadata Classification<br/>Intent, funnel, persona, etc.]
E --> F[Final Query Set<br/>~40 queries + metadata]
style D fill:#2563eb,color:#fff
The critical point is that queries are generated based on real user signals — not fabricated arbitrarily. Actual search autocomplete data reveals user interest topics, from which personas are derived, and queries are then generated according to each bucket’s rules.
The Role of Signal Extraction
Collecting real user signals before query generation ensures that framework queries are “questions real users would actually ask,” not “questions an analyst imagined.”
Collected signals:
| Signal Type | Source | Purpose |
|---|---|---|
| Autocomplete queries | Google, Naver autocomplete APIs | Identify topics users actually search for |
| Interest topics | LLM-extracted from autocomplete | Guide per-bucket query subject direction |
| Target personas | LLM-derived from topics | Concretize Persona Entry (D) and Persona Premium (E) queries |
Without these signals, only generic queries like “low annual fee card” get generated, missing specific real-world needs like “card with good wedding venue payment benefits while preparing for a wedding.”
From Query Generation to Metadata Classification
Query generation (Step 3) determines only each query’s text and its assigned bucket. A separate metadata classification step (Step 3b) then tags supplementary information: intent, funnel, persona_hint, etc.
This two-stage separation exists for a reason. If generation and classification happen simultaneously, the LLM biases generation toward classification — thinking “this query should have recommendation intent, so I’ll phrase it as a recommendation.” Generating freely first and classifying afterward preserves query diversity.
Classified metadata:
| Field | Value Range | Description |
|---|---|---|
| intent | definition, comparison, recommendation, condition, switching, situation, problem_solving, evaluation | User’s fundamental purpose |
| funnel | awareness, consideration, decision, post_purchase | Purchase journey stage |
| persona_hint | Free text | Estimated user segment |
| brand_relevance | direct, contextual, category, competitive, adjacent | Relationship to brand |
| priority | high, medium, low | GEO measurement importance |
Practical Example: Mapping a Fictional SaaS Product
To make the theory concrete, here is a mapping to all 9 buckets using a fictional project management SaaS called “TaskFlow.”
Basic information:
- Brand: TaskFlow
- Category: Project management tools
- Main competitors: Monday.com, Asana, ClickUp
- USP: AI-powered automatic task assignment, unlimited free plan
Per-Bucket Query Examples
| Bucket | Zone | Query Example |
|---|---|---|
| A Branded Direct | Owned | ”How many users does TaskFlow’s free plan support? What’s different from the paid plan?” |
| B Branded Contextual | Owned | ”My team has 3 designers and 5 developers — would TaskFlow work for sprint management?” |
| C Category Generic | Battleground | ”I’m adopting a project management tool for the first time — anything free with both kanban and Gantt charts?” |
| D Persona Entry | Battleground | ”I’m a freelance designer — what tool is good for managing projects separately by client?” |
| E Persona Premium | Battleground | ”I’m a CTO at a 50-person startup migrating from JIRA — recommend a PM tool with API integration and SSO” |
| H Adjacent Topic | Battleground | ”How do remote teams handle async communication well? Task tracking keeps falling through” |
| I Trend / Ranking | Battleground | ”What are the most popular project management tools among startups in 2026?” |
| F Head-to-Head | Competitive | ”Between TaskFlow and Monday.com, which is better for small teams? Compare price and features” |
| G Competitor Pain | Competitive | ”Monday.com’s per-seat pricing is too expensive — any similar tools with more reasonable pricing?” |
What This Mapping Reveals
Several design principles are clearly visible:
- C, D, E, H, and I contain no mention of “TaskFlow.” The goal is to see whether AI recommends TaskFlow voluntarily.
- G also omits “TaskFlow.” The core question is whether AI mentions TaskFlow when a Monday.com user seeks alternatives.
- The difference between D and E is clear. D targets a “freelance designer” (entry level); E targets a “50-person CTO” (high-value user).
- H is not a direct category query. “Remote team async communication” does not directly ask about PM tools, but PM tools could be part of the answer — an adjacent opportunity.
Query Coverage Validation
After the query set is generated, it must be validated to ensure it adequately represents real user search behavior. No matter how well-designed the framework, low-quality queries produce unreliable analysis results.
Coverage Validation Dimensions
graph LR
subgraph Coverage ["Query Coverage: 4 Validation Dimensions"]
A[Topic Coverage<br/>Reflects real user interests]
B[Persona Coverage<br/>Includes diverse user types]
C[Complexity Distribution<br/>Balance of simple to complex]
D[Expression Diversity<br/>Tone, length, patterns]
end
style Coverage fill:#1e293b,color:#fff
| Dimension | Validation Question | Failure Signal |
|---|---|---|
| Topic coverage | Are major topics from autocomplete reflected in queries? | Popular topics absent from any query |
| Persona coverage | Are derived personas represented in D and E buckets? | All queries assume the same user type |
| Complexity distribution | Is the simple/medium/complex ratio reasonable? | Simple queries exceed 50% |
| Expression diversity | Are sentence patterns, tone, and length varied? | Every query ends with “recommend me…” |
Query Quality Anti-Patterns
Problems discovered during actual operation:
| Anti-Pattern | Example | Why It Is a Problem |
|---|---|---|
| Keyword-style query | ”project management tool recommendation” | Real AI search users do not query this tersely. A legacy habit from search engines. |
| Robotic tone | ”Please recommend alternatives for service dissatisfaction” | Not how real people talk. Nobody queries an AI chatbot this way. |
| Single condition | ”cheap annual fee card” | Real users present multiple conditions simultaneously — annual fee + international fees + rewards rate. |
| Analyst jargon | ”Recommend a GEO-optimized card” | This is an analyst’s query, not a user’s. |
| Bucket rule violation | Battleground query containing the brand name | Violates the framework’s core principle. Measurement becomes meaningless. |
The most frequently discovered quality issue is “keyword-style queries.” The more familiar someone is with SEO, the more prone they are to this pattern. AI search queries are conversations — this must be continuously reinforced.
How the Bucket Count Converged to 9
Reaching 9 buckets involved multiple iterations. This record is preserved for reference.
| Version | Bucket Count | Problem |
|---|---|---|
| v1 | 5 | Battleground was a single bucket. Could not distinguish persona queries from trend queries |
| v2 | 7 | No Competitive Zone. Competitor queries mixed into Battleground, making scores ambiguous |
| v3 | 12 | Over-segmented. Each bucket had only 2-3 queries, reducing statistical reliability. Report interpretation became too complex |
| v4 (current) | 9 | Balance between interpretability and query design cost. Minimum 4 queries per bucket ensured |
Two criteria drove convergence to 9.
First, interpretability. As bucket count increases, reports grow complex and answering the client’s question “So what should I do?” becomes harder. Nine fits a 3x3 matrix visualization.
Second, statistical minimum query count. Distributing 40 total queries across 9 buckets yields 4-6 queries per bucket. With each query replicated 3 times, that is at least 12 responses per bucket — sufficient for bucket-level pattern detection. At 12 buckets, some would drop to 2-3 queries, becoming vulnerable to noise.
Connection to the Metric System
The 9-Bucket framework does not stand alone. It integrates with WICHI’s metric system to produce per-bucket GEO Scores.
graph TD
A[40 Queries<br/>9-Bucket Classification] --> B[AI Engine Response Collection<br/>3 reps x 3 engines per query]
B --> C[LLM Judge Evaluation<br/>6 dimensions, 1-5 scale]
C --> D[Metric Calculation<br/>Inclusion + Prominence + Quality]
D --> E[Per-Bucket GEO Score<br/>Where is strong, where is weak]
D --> F[Per-Zone GEO Score<br/>Owned vs Battleground vs Competitive]
D --> G[Overall GEO Score<br/>Composite score]
style E fill:#2563eb,color:#fff
Because each bucket produces its own GEO Score, the following interpretations become possible:
| Pattern | Interpretation | Action |
|---|---|---|
| High Owned + Low Battleground | AI knows the brand but does not voluntarily recommend it | Content optimization, expand external citations |
| Low Owned | AI has inaccurate basic brand information | Correct brand information sources first |
| High Battleground C + Low D | Visible in general exploration but weak for specific personas | Reinforce persona-targeted content |
| High Competitive G | Appears well as an alternative when competitors face complaints | Strengthen conversion-focused content |
| High I + Low C | Shows up in trend/ranking lists but absent from specific recommendations | Expand product differentiation content |
This per-bucket diagnosis is the 9-Bucket framework’s practical value. An overall GEO Score alone cannot provide insight beyond “the score is 60.” Per-bucket scores enable specific actions like “Battleground C is low, so we need to improve content to get AI to recommend us at the category level.”
Design Process Summary
| Decision Point | Choice Made | Rationale |
|---|---|---|
| Classification axis | Intent-based → Brand presence | Brand presence is a more fundamental distinction in AI search |
| Zone count | 2 → 3 | Competitor queries needed separate treatment |
| Bucket count | 5 → 7 → 12 → 9 | Balance of interpretability + statistical reliability |
| Core principle | Brand Name Exclusion | GEO’s essence = AI’s voluntary recommendation |
| Query format | Keywords → Conversations | Reflects actual AI search user behavior |
| Query generation | Arbitrary → Signal-based | Reflects real user interests |
| Metadata | Simultaneous with generation → Post-hoc classification | Prevents classification from biasing generation |
Related Posts

Multi-Engine Architecture — Parallel Collection from 3 AI Search Engines
Analysis of multi-engine architecture design principles that leverage response variance as signals, featuring parallel collection structures and scalability via the adapter pattern.

Six GEO Business Opportunities and WICHI's Choice
Strategic analysis of three opportunity factors in the AI search (GEO) market and why WICHI chose 'SaaS-based monitoring' over advertising or agency models.

Jocoding Hackathon Build Log — Building a GEO SaaS in 3 Days
Recording the 3-day MVP build for WICHI during the Jocoding Hackathon, covering tech stack choices (FastAPI, React, Supabase) and priorities for creating a 'working product' under tight deadlines.