Designing the 9-Bucket Query Framework

Documentation of WICHI's 9-Bucket query framework. Defines 3 Zones and 9 Buckets based on brand presence to measure AI's organic recommendations effectively.

The Problem: Scores Are Meaningless Without Query Classification

The starting point of GEO (Generative Engine Optimization) analysis is the query. Brand exposure results change completely depending on what question you ask the AI search engine.

In WICHI’s early days, all queries were processed as a single undifferentiated group. “Samsung Card annual fee” and “recommend a low annual fee card” were analyzed with the same criteria. The former is a search where the user already knows the brand. The latter is a category-level exploration where the user has no specific brand in mind. Samsung Card appearing in each of these queries means entirely different things.

Without this distinction, the GEO Score was uninterpretable. When the score was high, there was no way to tell whether it was genuinely good, inflated by branded queries, or actually performing well on competitive queries. Three specific problems emerged.

First, performance distortion. Queries containing the brand name (e.g., “Samsung Card annual fee”) will naturally mention that brand — the user already specified it. When these queries inflate the overall score, the GEO Score can look strong even when AI never voluntarily recommends the brand.

Second, no actionable connection. When the score was low, there was no way to know what to fix. Is brand awareness the problem? Is AI failing to recommend the brand in category queries? Is the brand absent from competitive contexts? Without classification, these scenarios were indistinguishable.

Third, misclassification of competitor queries. Samsung Card being mentioned in a “Hyundai Card drawbacks” query is fundamentally different from it appearing in “recommend a low annual fee card.” The former captures competitor churn demand; the latter represents AI’s organic recommendation. Mixing these in the same bucket makes score interpretation ambiguous.

A GEO Score without query classification is like a school-wide average grade. Without distinguishing the student who excels in math from the one who excels in English, you cannot determine which subject needs reinforcement.

A systematic query classification framework was needed to solve this.

Choosing the Classification Axis: Brand Presence, Not Intent

The initial approach was to adopt a familiar taxonomy from SEO: search intent classification.

Intent	Description	Example
Informational	Information seeking	”What are credit card annual fees?”
Transactional	Purchase/action intent	”Apply for Samsung Card online”
Navigational	Navigate to specific site	”Samsung Card homepage”
Commercial Investigation	Pre-purchase comparison	”Compare low annual fee cards”

This classification works well for traditional SEO. But for AI search, its limitations are clear.

The core question in AI search is: “When the user does not specify a brand, which brand does AI recommend?” Intent-based classification cannot capture this. Whether the intent is Informational or Transactional, whether or not the query contains a brand name is a far more fundamental axis.

For example, the following two queries are both classified as Commercial Investigation:

“How much is Samsung Card’s annual fee?”
“Which cards have low annual fees?”

From a GEO perspective, these queries have completely different natures. AI mentioning Samsung Card in the first is expected. AI mentioning Samsung Card in the second — that is genuine GEO performance.

graph LR
    A[Query Classification Axis] --> B{SEO Intent-Based?}
    B -->|Informational| C[Cannot distinguish<br/>brand presence]
    B -->|Transactional| C
    B -->|Navigational| C
    A --> D{Brand Presence?}
    D -->|Brand included| E[Owned Zone<br/>Baseline measurement]
    D -->|No brand| F[Battleground Zone<br/>Core KPI]
    D -->|Competitor included| G[Competitive Zone<br/>Competitive positioning]
    style D fill:#2563eb,color:#fff
    style B fill:#6b7280,color:#fff

The conclusion: the primary classification axis for GEO analysis is brand presence in the query. Intent classification is retained as secondary metadata, while the primary structure is built around brand relationship.

The 3-Zone Structure: Owned, Battleground, Competitive

Queries are divided into three zones based on brand presence.

Zone	Definition	What It Measures	Query Examples (Credit Cards)
Owned	Query contains the target brand name	Brand awareness-based exposure, AI’s accuracy on brand information	”Samsung Card annual fee,” “Samsung Card international payments”
Battleground	No brand name, category/needs-level search	Whether AI voluntarily recommends the brand (core KPI)	“Recommend a low annual fee card,” “First card for a new graduate”
Competitive	Competitor brand explicitly mentioned	Presence in competitor contexts, capturing churn demand	”Hyundai Card drawbacks,” “Shinhan Card vs Samsung Card”

This 3-zone distinction is the framework’s backbone. Each zone measures something different, and scores carry different meanings.

What Each Zone’s Score Means

High Owned Zone score: AI accurately knows the brand’s basic information. This is a necessary condition, not a sufficient one. This score alone cannot judge GEO performance.

High Battleground Zone score: AI voluntarily recommends the brand even when the user did not specify it. This is the essence of GEO and the area with real business value.

High Competitive Zone score: The brand appears as an alternative even in contexts where competitors are mentioned. Exposure in competitor complaint queries represents direct customer conversion opportunities.

The Owned Zone is a “baseline.” Answering when someone calls your name is table stakes. A high score in the Battleground Zone means you have secured meaningful visibility in the age of AI search.

Why It Was Not 3 Zones From the Start

Initially, only two zones were used: Owned and Battleground. Competitor queries were placed in Battleground, which made scores ambiguous.

Samsung Card being mentioned in a “Hyundai Card drawbacks” query is different from Battleground’s organic recommendation. It captures competitor churn demand — not a case of AI spontaneously recommending the brand at the category level. Without separating these, the Battleground score’s meaning gets diluted.

Additionally, queries like “Samsung Card vs Hyundai Card” did not fit cleanly into either zone. They contain a brand name (so not Battleground), but a competitor is present too (so not purely Owned). A separate zone was needed.

graph TD
    A[Query Input] --> B{Contains target<br/>brand name?}
    B -->|Yes| C{Also contains<br/>competitor brand?}
    B -->|No| D{Contains<br/>competitor brand?}
    C -->|Yes| E[Competitive Zone<br/>Head-to-Head]
    C -->|No| F[Owned Zone]
    D -->|Yes| G[Competitive Zone<br/>Competitor Pain]
    D -->|No| H[Battleground Zone]
    style H fill:#2563eb,color:#fff
    style F fill:#059669,color:#fff
    style E fill:#dc2626,color:#fff
    style G fill:#dc2626,color:#fff

Detailed Design of 9 Buckets

The 3 zones are further subdivided into 9 total buckets. Each bucket has a unique measurement purpose and design rationale.

The 3x3 Matrix Structure

graph TB
    subgraph Owned ["Owned Zone (Brand Included)"]
        A["A: Branded Direct<br/>Feature/spec questions"]
        B_bucket["B: Branded Contextual<br/>Situational context questions"]
    end
    subgraph Battleground ["Battleground Zone (No Brand)"]
        C["C: Category Generic<br/>Category exploration"]
        D["D: Persona Entry<br/>Persona entry point"]
        E["E: Persona Premium<br/>High-value persona"]
        H["H: Adjacent Topic<br/>Related topics"]
        I["I: Trend / Ranking<br/>Trends and rankings"]
    end
    subgraph Competitive ["Competitive Zone (Competitor Included)"]
        F["F: Head-to-Head<br/>Direct comparison"]
        G["G: Competitor Pain<br/>Competitor complaints"]
    end
    style Battleground fill:#1e3a5f,color:#fff
    style Owned fill:#1a4731,color:#fff
    style Competitive fill:#5c1a1a,color:#fff

Owned Zone: Baseline Measurement

The Owned Zone measures how AI describes a brand when the brand name is already in the query. If scores here are low, AI does not even have accurate basic information about the brand — a problem that must be resolved before examining other zones.

Bucket	Name	Measurement Purpose	Query Pattern	Example (SaaS)
A	Branded Direct	AI’s factual accuracy on brand features/specs	Brand name + specific feature question	”What are Notion AI’s pricing plans?”
B	Branded Contextual	AI’s contextual understanding in a specific scenario	Brand name + usage scenario	”My team is 10 people — would Notion work for project management?”

Branded Direct vs. Branded Contextual: Direct is fact-checking in nature — “How much does it cost?”, “Does it have this feature?” Contextual involves contextual reasoning — “Would this product fit my situation?” AI may know the facts accurately but fail at contextual judgment, or vice versa. Separating these clarifies the improvement direction.

Battleground Zone: The Core KPI

The Battleground Zone is the framework’s heart. It measures whether AI voluntarily recommends a brand when the user has not specified one. It is subdivided into 5 buckets because even within Battleground, query characteristics and conversion potential vary widely.

Bucket	Name	Measurement Purpose	Query Pattern	Example (SaaS)
C	Category Generic	AI recommendation in general category exploration	Category keyword + compound conditions	”I need a project management tool that’s free and supports both kanban and Gantt charts”
D	Persona Entry	Recommendation for a specific persona’s entry query	Persona + category question	”I’m on a 5-person startup team — what collaboration tool should we start with?”
E	Persona Premium	Recommendation for high-involvement, high-value personas	Expert/high-value user + professional needs	”I need enterprise-grade security — any project management tools with SOC2 certification?”
H	Adjacent Topic	Exposure in related but not direct-category topics	Adjacent topic + indirect connection	”How should remote teams handle async communication? Task tracking keeps falling through”
I	Trend / Ranking	Positioning in ranking and trend queries	Time reference + ranking/trend	”What are the top project management tools for startups in 2026?”

Design rationale for separating each bucket:

Distinction	Why a Separate Bucket
C vs D	Category Generic is “what exists” exploration; Persona Entry is “what fits my situation.” AI recommendation logic operates differently for each.
D vs E	Entry personas are in early exploration; Premium personas have specific, concrete requirements. Business value (LTV) differs, requiring separate measurement.
H exists because	Adjacent topic mentions represent new acquisition channels. Notion appearing in “remote work tips” is a different opportunity than direct category queries.
I exists because	”Best X of 2026” queries cause AI to generate ordered lists. Position within the list matters, and measurement methodology differs from other buckets.

Competitive Zone: Competitive Positioning

The Competitive Zone measures brand presence in contexts where competitors are directly mentioned.

Bucket	Name	Measurement Purpose	Query Pattern	Example (SaaS)
F	Head-to-Head	Which side AI favors in direct comparison	Brand vs. competitor	”Between Notion and Monday.com, which is better for startups?”
G	Competitor Pain	Whether the brand appears as an alternative to competitor complaints	Competitor + dissatisfaction/churn scenario	”Monday.com is too expensive — any similar tools that are more affordable?”

Head-to-Head vs. Competitor Pain: Head-to-Head is a neutral comparison — the user knows both brands and wants an objective assessment. Competitor Pain has directionality — the user is dissatisfied with a competitor and actively seeking alternatives. Appearing in the latter represents a direct switching opportunity.

Core Design Principles

Principle 1: Brand Name Exclusion

This is the most important design principle in the entire framework.

Battleground Zone queries must never contain the target brand name.

The rationale is simple. The essence of GEO is measuring “whether AI voluntarily recommends the brand when the user has not specified it.”

If AI mentions Samsung Card when asked “Recommend a card with low annual fees” — that is genuine GEO performance. Including “Samsung Card” in the query and then checking whether AI mentions it is not measurement; it is self-confirmation.

This principle applies with modifications to the Competitive Zone.

Zone	Brand Name Rule
Owned (A, B)	Target brand name must be included. Competitor names not allowed.
Battleground (C, D, E, H, I)	Target brand name never included. Competitor names not included either.
Competitive F (Head-to-Head)	Target brand name + competitor name both included.
Competitive G (Competitor Pain)	Competitor name only. Target brand name never included.

Excluding the target brand from Competitor Pain (G) follows the same logic as Battleground. Asking “Monday.com is too expensive — how about Notion?” forces AI to mention Notion. Asking “Monday.com is too expensive — any similar but more affordable tools?” and having AI voluntarily recommend Notion — that is meaningful measurement.

Principle 2: Queries Are Conversations, Not Keywords

AI search queries are fundamentally different from traditional search engine queries.

Aspect	Traditional Search Engine	AI Search
Format	Keyword strings	Natural language conversations
Example	”credit card annual fee comparison"	"I’m a new graduate and I want a card with low annual fees and low international transaction fees”
Condition count	Typically 1-2	2-3+ compound conditions
Tone	None	Casual/formal variations

Using keyword-style queries like “credit card recommendation” fails to represent actual AI search user behavior. Real users describe their situations, present multiple conditions simultaneously, and ask in conversational language. The query generation stage must reflect this difference.

Principle 3: Complexity Distribution Control

Query complexity (number of conditions) is intentionally managed in its distribution.

Complexity	Description	Target Share	Allowed Buckets
Simple (1 condition)	Single-condition questions	Minority	Primarily Trend/Ranking (I)
Medium (2 conditions)	Dual-condition questions	Moderate	All
Complex (3+ conditions)	Multi-condition questions	Largest share	All

Testing with only simple queries measures only AI’s default recommendation list. Whether AI recommends a brand under compound conditions is a more rigorous test and more closely mirrors actual user behavior.

Query Expansion Pipeline

The 9-Bucket framework is a classification system only. The actual queries used for analysis are generated through a separate expansion pipeline.

Pipeline Structure

graph TD
    A[User Input<br/>Brand name + Product name] --> B[Step 0: Auto-generate Brand Info<br/>Category, competitors, USP, etc.]
    B --> C[Step 2: Signal Extraction<br/>Autocomplete + Topics + Personas]
    C --> D[Step 3: Query Expansion<br/>Generate with 9-Bucket distribution]
    D --> E[Step 3b: Metadata Classification<br/>Intent, funnel, persona, etc.]
    E --> F[Final Query Set<br/>~40 queries + metadata]
    style D fill:#2563eb,color:#fff

The critical point is that queries are generated based on real user signals — not fabricated arbitrarily. Actual search autocomplete data reveals user interest topics, from which personas are derived, and queries are then generated according to each bucket’s rules.

The Role of Signal Extraction

Collecting real user signals before query generation ensures that framework queries are “questions real users would actually ask,” not “questions an analyst imagined.”

Collected signals:

Signal Type	Source	Purpose
Autocomplete queries	Google, Naver autocomplete APIs	Identify topics users actually search for
Interest topics	LLM-extracted from autocomplete	Guide per-bucket query subject direction
Target personas	LLM-derived from topics	Concretize Persona Entry (D) and Persona Premium (E) queries

Without these signals, only generic queries like “low annual fee card” get generated, missing specific real-world needs like “card with good wedding venue payment benefits while preparing for a wedding.”

From Query Generation to Metadata Classification

Query generation (Step 3) determines only each query’s text and its assigned bucket. A separate metadata classification step (Step 3b) then tags supplementary information: intent, funnel, persona_hint, etc.

This two-stage separation exists for a reason. If generation and classification happen simultaneously, the LLM biases generation toward classification — thinking “this query should have recommendation intent, so I’ll phrase it as a recommendation.” Generating freely first and classifying afterward preserves query diversity.

Classified metadata:

Field	Value Range	Description
intent	definition, comparison, recommendation, condition, switching, situation, problem_solving, evaluation	User’s fundamental purpose
funnel	awareness, consideration, decision, post_purchase	Purchase journey stage
persona_hint	Free text	Estimated user segment
brand_relevance	direct, contextual, category, competitive, adjacent	Relationship to brand
priority	high, medium, low	GEO measurement importance

Practical Example: Mapping a Fictional SaaS Product

To make the theory concrete, here is a mapping to all 9 buckets using a fictional project management SaaS called “TaskFlow.”

Basic information:

Brand: TaskFlow
Category: Project management tools
Main competitors: Monday.com, Asana, ClickUp
USP: AI-powered automatic task assignment, unlimited free plan

Per-Bucket Query Examples

Bucket	Zone	Query Example
A Branded Direct	Owned	”How many users does TaskFlow’s free plan support? What’s different from the paid plan?”
B Branded Contextual	Owned	”My team has 3 designers and 5 developers — would TaskFlow work for sprint management?”
C Category Generic	Battleground	”I’m adopting a project management tool for the first time — anything free with both kanban and Gantt charts?”
D Persona Entry	Battleground	”I’m a freelance designer — what tool is good for managing projects separately by client?”
E Persona Premium	Battleground	”I’m a CTO at a 50-person startup migrating from JIRA — recommend a PM tool with API integration and SSO”
H Adjacent Topic	Battleground	”How do remote teams handle async communication well? Task tracking keeps falling through”
I Trend / Ranking	Battleground	”What are the most popular project management tools among startups in 2026?”
F Head-to-Head	Competitive	”Between TaskFlow and Monday.com, which is better for small teams? Compare price and features”
G Competitor Pain	Competitive	”Monday.com’s per-seat pricing is too expensive — any similar tools with more reasonable pricing?”

What This Mapping Reveals

Several design principles are clearly visible:

C, D, E, H, and I contain no mention of “TaskFlow.” The goal is to see whether AI recommends TaskFlow voluntarily.
G also omits “TaskFlow.” The core question is whether AI mentions TaskFlow when a Monday.com user seeks alternatives.
The difference between D and E is clear. D targets a “freelance designer” (entry level); E targets a “50-person CTO” (high-value user).
H is not a direct category query. “Remote team async communication” does not directly ask about PM tools, but PM tools could be part of the answer — an adjacent opportunity.

Query Coverage Validation

After the query set is generated, it must be validated to ensure it adequately represents real user search behavior. No matter how well-designed the framework, low-quality queries produce unreliable analysis results.

Coverage Validation Dimensions

graph LR
    subgraph Coverage ["Query Coverage: 4 Validation Dimensions"]
        A[Topic Coverage<br/>Reflects real user interests]
        B[Persona Coverage<br/>Includes diverse user types]
        C[Complexity Distribution<br/>Balance of simple to complex]
        D[Expression Diversity<br/>Tone, length, patterns]
    end
    style Coverage fill:#1e293b,color:#fff

Dimension	Validation Question	Failure Signal
Topic coverage	Are major topics from autocomplete reflected in queries?	Popular topics absent from any query
Persona coverage	Are derived personas represented in D and E buckets?	All queries assume the same user type
Complexity distribution	Is the simple/medium/complex ratio reasonable?	Simple queries exceed 50%
Expression diversity	Are sentence patterns, tone, and length varied?	Every query ends with “recommend me…”

Query Quality Anti-Patterns

Problems discovered during actual operation:

Anti-Pattern	Example	Why It Is a Problem
Keyword-style query	”project management tool recommendation”	Real AI search users do not query this tersely. A legacy habit from search engines.
Robotic tone	”Please recommend alternatives for service dissatisfaction”	Not how real people talk. Nobody queries an AI chatbot this way.
Single condition	”cheap annual fee card”	Real users present multiple conditions simultaneously — annual fee + international fees + rewards rate.
Analyst jargon	”Recommend a GEO-optimized card”	This is an analyst’s query, not a user’s.
Bucket rule violation	Battleground query containing the brand name	Violates the framework’s core principle. Measurement becomes meaningless.

The most frequently discovered quality issue is “keyword-style queries.” The more familiar someone is with SEO, the more prone they are to this pattern. AI search queries are conversations — this must be continuously reinforced.

How the Bucket Count Converged to 9

Reaching 9 buckets involved multiple iterations. This record is preserved for reference.

Version	Bucket Count	Problem
v1	5	Battleground was a single bucket. Could not distinguish persona queries from trend queries
v2	7	No Competitive Zone. Competitor queries mixed into Battleground, making scores ambiguous
v3	12	Over-segmented. Each bucket had only 2-3 queries, reducing statistical reliability. Report interpretation became too complex
v4 (current)	9	Balance between interpretability and query design cost. Minimum 4 queries per bucket ensured

Two criteria drove convergence to 9.

First, interpretability. As bucket count increases, reports grow complex and answering the client’s question “So what should I do?” becomes harder. Nine fits a 3x3 matrix visualization.

Second, statistical minimum query count. Distributing 40 total queries across 9 buckets yields 4-6 queries per bucket. With each query replicated 3 times, that is at least 12 responses per bucket — sufficient for bucket-level pattern detection. At 12 buckets, some would drop to 2-3 queries, becoming vulnerable to noise.

Connection to the Metric System

The 9-Bucket framework does not stand alone. It integrates with WICHI’s metric system to produce per-bucket GEO Scores.

graph TD
    A[40 Queries<br/>9-Bucket Classification] --> B[AI Engine Response Collection<br/>3 reps x 3 engines per query]
    B --> C[LLM Judge Evaluation<br/>6 dimensions, 1-5 scale]
    C --> D[Metric Calculation<br/>Inclusion + Prominence + Quality]
    D --> E[Per-Bucket GEO Score<br/>Where is strong, where is weak]
    D --> F[Per-Zone GEO Score<br/>Owned vs Battleground vs Competitive]
    D --> G[Overall GEO Score<br/>Composite score]
    style E fill:#2563eb,color:#fff

Because each bucket produces its own GEO Score, the following interpretations become possible:

Pattern	Interpretation	Action
High Owned + Low Battleground	AI knows the brand but does not voluntarily recommend it	Content optimization, expand external citations
Low Owned	AI has inaccurate basic brand information	Correct brand information sources first
High Battleground C + Low D	Visible in general exploration but weak for specific personas	Reinforce persona-targeted content
High Competitive G	Appears well as an alternative when competitors face complaints	Strengthen conversion-focused content
High I + Low C	Shows up in trend/ranking lists but absent from specific recommendations	Expand product differentiation content

This per-bucket diagnosis is the 9-Bucket framework’s practical value. An overall GEO Score alone cannot provide insight beyond “the score is 60.” Per-bucket scores enable specific actions like “Battleground C is low, so we need to improve content to get AI to recommend us at the category level.”

Design Process Summary

Decision Point	Choice Made	Rationale
Classification axis	Intent-based → Brand presence	Brand presence is a more fundamental distinction in AI search
Zone count	2 → 3	Competitor queries needed separate treatment
Bucket count	5 → 7 → 12 → 9	Balance of interpretability + statistical reliability
Core principle	Brand Name Exclusion	GEO’s essence = AI’s voluntary recommendation
Query format	Keywords → Conversations	Reflects actual AI search user behavior
Query generation	Arbitrary → Signal-based	Reflects real user interests
Metadata	Simultaneous with generation → Post-hoc classification	Prevents classification from biasing generation