Minbook
KO
Designing the 9-Bucket Query Framework

Designing the 9-Bucket Query Framework

MJ · · 13 min read

Documentation of WICHI's 9-Bucket query framework. Defines 3 Zones and 9 Buckets based on brand presence to measure AI's organic recommendations effectively.

The Problem: Scores Are Meaningless Without Query Classification

The starting point of GEO (Generative Engine Optimization) analysis is the query. Brand exposure results change completely depending on what question you ask the AI search engine.

In WICHI’s early days, all queries were processed as a single undifferentiated group. “Samsung Card annual fee” and “recommend a low annual fee card” were analyzed with the same criteria. The former is a search where the user already knows the brand. The latter is a category-level exploration where the user has no specific brand in mind. Samsung Card appearing in each of these queries means entirely different things.

Without this distinction, the GEO Score was uninterpretable. When the score was high, there was no way to tell whether it was genuinely good, inflated by branded queries, or actually performing well on competitive queries. Three specific problems emerged.

First, performance distortion. Queries containing the brand name (e.g., “Samsung Card annual fee”) will naturally mention that brand — the user already specified it. When these queries inflate the overall score, the GEO Score can look strong even when AI never voluntarily recommends the brand.

Second, no actionable connection. When the score was low, there was no way to know what to fix. Is brand awareness the problem? Is AI failing to recommend the brand in category queries? Is the brand absent from competitive contexts? Without classification, these scenarios were indistinguishable.

Third, misclassification of competitor queries. Samsung Card being mentioned in a “Hyundai Card drawbacks” query is fundamentally different from it appearing in “recommend a low annual fee card.” The former captures competitor churn demand; the latter represents AI’s organic recommendation. Mixing these in the same bucket makes score interpretation ambiguous.

A GEO Score without query classification is like a school-wide average grade. Without distinguishing the student who excels in math from the one who excels in English, you cannot determine which subject needs reinforcement.

A systematic query classification framework was needed to solve this.


Choosing the Classification Axis: Brand Presence, Not Intent

The initial approach was to adopt a familiar taxonomy from SEO: search intent classification.

IntentDescriptionExample
InformationalInformation seeking”What are credit card annual fees?”
TransactionalPurchase/action intent”Apply for Samsung Card online”
NavigationalNavigate to specific site”Samsung Card homepage”
Commercial InvestigationPre-purchase comparison”Compare low annual fee cards”

This classification works well for traditional SEO. But for AI search, its limitations are clear.

The core question in AI search is: “When the user does not specify a brand, which brand does AI recommend?” Intent-based classification cannot capture this. Whether the intent is Informational or Transactional, whether or not the query contains a brand name is a far more fundamental axis.

For example, the following two queries are both classified as Commercial Investigation:

  • “How much is Samsung Card’s annual fee?”
  • “Which cards have low annual fees?”

From a GEO perspective, these queries have completely different natures. AI mentioning Samsung Card in the first is expected. AI mentioning Samsung Card in the second — that is genuine GEO performance.

graph LR
    A[Query Classification Axis] --> B{SEO Intent-Based?}
    B -->|Informational| C[Cannot distinguish<br/>brand presence]
    B -->|Transactional| C
    B -->|Navigational| C
    A --> D{Brand Presence?}
    D -->|Brand included| E[Owned Zone<br/>Baseline measurement]
    D -->|No brand| F[Battleground Zone<br/>Core KPI]
    D -->|Competitor included| G[Competitive Zone<br/>Competitive positioning]
    style D fill:#2563eb,color:#fff
    style B fill:#6b7280,color:#fff

The conclusion: the primary classification axis for GEO analysis is brand presence in the query. Intent classification is retained as secondary metadata, while the primary structure is built around brand relationship.


The 3-Zone Structure: Owned, Battleground, Competitive

Queries are divided into three zones based on brand presence.

ZoneDefinitionWhat It MeasuresQuery Examples (Credit Cards)
OwnedQuery contains the target brand nameBrand awareness-based exposure, AI’s accuracy on brand information”Samsung Card annual fee,” “Samsung Card international payments”
BattlegroundNo brand name, category/needs-level searchWhether AI voluntarily recommends the brand (core KPI)“Recommend a low annual fee card,” “First card for a new graduate”
CompetitiveCompetitor brand explicitly mentionedPresence in competitor contexts, capturing churn demand”Hyundai Card drawbacks,” “Shinhan Card vs Samsung Card”

This 3-zone distinction is the framework’s backbone. Each zone measures something different, and scores carry different meanings.

What Each Zone’s Score Means

High Owned Zone score: AI accurately knows the brand’s basic information. This is a necessary condition, not a sufficient one. This score alone cannot judge GEO performance.

High Battleground Zone score: AI voluntarily recommends the brand even when the user did not specify it. This is the essence of GEO and the area with real business value.

High Competitive Zone score: The brand appears as an alternative even in contexts where competitors are mentioned. Exposure in competitor complaint queries represents direct customer conversion opportunities.

The Owned Zone is a “baseline.” Answering when someone calls your name is table stakes. A high score in the Battleground Zone means you have secured meaningful visibility in the age of AI search.

Why It Was Not 3 Zones From the Start

Initially, only two zones were used: Owned and Battleground. Competitor queries were placed in Battleground, which made scores ambiguous.

Samsung Card being mentioned in a “Hyundai Card drawbacks” query is different from Battleground’s organic recommendation. It captures competitor churn demand — not a case of AI spontaneously recommending the brand at the category level. Without separating these, the Battleground score’s meaning gets diluted.

Additionally, queries like “Samsung Card vs Hyundai Card” did not fit cleanly into either zone. They contain a brand name (so not Battleground), but a competitor is present too (so not purely Owned). A separate zone was needed.

graph TD
    A[Query Input] --> B{Contains target<br/>brand name?}
    B -->|Yes| C{Also contains<br/>competitor brand?}
    B -->|No| D{Contains<br/>competitor brand?}
    C -->|Yes| E[Competitive Zone<br/>Head-to-Head]
    C -->|No| F[Owned Zone]
    D -->|Yes| G[Competitive Zone<br/>Competitor Pain]
    D -->|No| H[Battleground Zone]
    style H fill:#2563eb,color:#fff
    style F fill:#059669,color:#fff
    style E fill:#dc2626,color:#fff
    style G fill:#dc2626,color:#fff

Detailed Design of 9 Buckets

The 3 zones are further subdivided into 9 total buckets. Each bucket has a unique measurement purpose and design rationale.

The 3x3 Matrix Structure

graph TB
    subgraph Owned ["Owned Zone (Brand Included)"]
        A["A: Branded Direct<br/>Feature/spec questions"]
        B_bucket["B: Branded Contextual<br/>Situational context questions"]
    end
    subgraph Battleground ["Battleground Zone (No Brand)"]
        C["C: Category Generic<br/>Category exploration"]
        D["D: Persona Entry<br/>Persona entry point"]
        E["E: Persona Premium<br/>High-value persona"]
        H["H: Adjacent Topic<br/>Related topics"]
        I["I: Trend / Ranking<br/>Trends and rankings"]
    end
    subgraph Competitive ["Competitive Zone (Competitor Included)"]
        F["F: Head-to-Head<br/>Direct comparison"]
        G["G: Competitor Pain<br/>Competitor complaints"]
    end
    style Battleground fill:#1e3a5f,color:#fff
    style Owned fill:#1a4731,color:#fff
    style Competitive fill:#5c1a1a,color:#fff

Owned Zone: Baseline Measurement

The Owned Zone measures how AI describes a brand when the brand name is already in the query. If scores here are low, AI does not even have accurate basic information about the brand — a problem that must be resolved before examining other zones.

BucketNameMeasurement PurposeQuery PatternExample (SaaS)
ABranded DirectAI’s factual accuracy on brand features/specsBrand name + specific feature question”What are Notion AI’s pricing plans?”
BBranded ContextualAI’s contextual understanding in a specific scenarioBrand name + usage scenario”My team is 10 people — would Notion work for project management?”

Branded Direct vs. Branded Contextual: Direct is fact-checking in nature — “How much does it cost?”, “Does it have this feature?” Contextual involves contextual reasoning — “Would this product fit my situation?” AI may know the facts accurately but fail at contextual judgment, or vice versa. Separating these clarifies the improvement direction.

Battleground Zone: The Core KPI

The Battleground Zone is the framework’s heart. It measures whether AI voluntarily recommends a brand when the user has not specified one. It is subdivided into 5 buckets because even within Battleground, query characteristics and conversion potential vary widely.

BucketNameMeasurement PurposeQuery PatternExample (SaaS)
CCategory GenericAI recommendation in general category explorationCategory keyword + compound conditions”I need a project management tool that’s free and supports both kanban and Gantt charts”
DPersona EntryRecommendation for a specific persona’s entry queryPersona + category question”I’m on a 5-person startup team — what collaboration tool should we start with?”
EPersona PremiumRecommendation for high-involvement, high-value personasExpert/high-value user + professional needs”I need enterprise-grade security — any project management tools with SOC2 certification?”
HAdjacent TopicExposure in related but not direct-category topicsAdjacent topic + indirect connection”How should remote teams handle async communication? Task tracking keeps falling through”
ITrend / RankingPositioning in ranking and trend queriesTime reference + ranking/trend”What are the top project management tools for startups in 2026?”

Design rationale for separating each bucket:

DistinctionWhy a Separate Bucket
C vs DCategory Generic is “what exists” exploration; Persona Entry is “what fits my situation.” AI recommendation logic operates differently for each.
D vs EEntry personas are in early exploration; Premium personas have specific, concrete requirements. Business value (LTV) differs, requiring separate measurement.
H exists becauseAdjacent topic mentions represent new acquisition channels. Notion appearing in “remote work tips” is a different opportunity than direct category queries.
I exists because”Best X of 2026” queries cause AI to generate ordered lists. Position within the list matters, and measurement methodology differs from other buckets.

Competitive Zone: Competitive Positioning

The Competitive Zone measures brand presence in contexts where competitors are directly mentioned.

BucketNameMeasurement PurposeQuery PatternExample (SaaS)
FHead-to-HeadWhich side AI favors in direct comparisonBrand vs. competitor”Between Notion and Monday.com, which is better for startups?”
GCompetitor PainWhether the brand appears as an alternative to competitor complaintsCompetitor + dissatisfaction/churn scenario”Monday.com is too expensive — any similar tools that are more affordable?”

Head-to-Head vs. Competitor Pain: Head-to-Head is a neutral comparison — the user knows both brands and wants an objective assessment. Competitor Pain has directionality — the user is dissatisfied with a competitor and actively seeking alternatives. Appearing in the latter represents a direct switching opportunity.


Core Design Principles

Principle 1: Brand Name Exclusion

This is the most important design principle in the entire framework.

Battleground Zone queries must never contain the target brand name.

The rationale is simple. The essence of GEO is measuring “whether AI voluntarily recommends the brand when the user has not specified it.”

If AI mentions Samsung Card when asked “Recommend a card with low annual fees” — that is genuine GEO performance. Including “Samsung Card” in the query and then checking whether AI mentions it is not measurement; it is self-confirmation.

This principle applies with modifications to the Competitive Zone.

ZoneBrand Name Rule
Owned (A, B)Target brand name must be included. Competitor names not allowed.
Battleground (C, D, E, H, I)Target brand name never included. Competitor names not included either.
Competitive F (Head-to-Head)Target brand name + competitor name both included.
Competitive G (Competitor Pain)Competitor name only. Target brand name never included.

Excluding the target brand from Competitor Pain (G) follows the same logic as Battleground. Asking “Monday.com is too expensive — how about Notion?” forces AI to mention Notion. Asking “Monday.com is too expensive — any similar but more affordable tools?” and having AI voluntarily recommend Notion — that is meaningful measurement.

Principle 2: Queries Are Conversations, Not Keywords

AI search queries are fundamentally different from traditional search engine queries.

AspectTraditional Search EngineAI Search
FormatKeyword stringsNatural language conversations
Example”credit card annual fee comparison""I’m a new graduate and I want a card with low annual fees and low international transaction fees”
Condition countTypically 1-22-3+ compound conditions
ToneNoneCasual/formal variations

Using keyword-style queries like “credit card recommendation” fails to represent actual AI search user behavior. Real users describe their situations, present multiple conditions simultaneously, and ask in conversational language. The query generation stage must reflect this difference.

Principle 3: Complexity Distribution Control

Query complexity (number of conditions) is intentionally managed in its distribution.

ComplexityDescriptionTarget ShareAllowed Buckets
Simple (1 condition)Single-condition questionsMinorityPrimarily Trend/Ranking (I)
Medium (2 conditions)Dual-condition questionsModerateAll
Complex (3+ conditions)Multi-condition questionsLargest shareAll

Testing with only simple queries measures only AI’s default recommendation list. Whether AI recommends a brand under compound conditions is a more rigorous test and more closely mirrors actual user behavior.


Query Expansion Pipeline

The 9-Bucket framework is a classification system only. The actual queries used for analysis are generated through a separate expansion pipeline.

Pipeline Structure

graph TD
    A[User Input<br/>Brand name + Product name] --> B[Step 0: Auto-generate Brand Info<br/>Category, competitors, USP, etc.]
    B --> C[Step 2: Signal Extraction<br/>Autocomplete + Topics + Personas]
    C --> D[Step 3: Query Expansion<br/>Generate with 9-Bucket distribution]
    D --> E[Step 3b: Metadata Classification<br/>Intent, funnel, persona, etc.]
    E --> F[Final Query Set<br/>~40 queries + metadata]
    style D fill:#2563eb,color:#fff

The critical point is that queries are generated based on real user signals — not fabricated arbitrarily. Actual search autocomplete data reveals user interest topics, from which personas are derived, and queries are then generated according to each bucket’s rules.

The Role of Signal Extraction

Collecting real user signals before query generation ensures that framework queries are “questions real users would actually ask,” not “questions an analyst imagined.”

Collected signals:

Signal TypeSourcePurpose
Autocomplete queriesGoogle, Naver autocomplete APIsIdentify topics users actually search for
Interest topicsLLM-extracted from autocompleteGuide per-bucket query subject direction
Target personasLLM-derived from topicsConcretize Persona Entry (D) and Persona Premium (E) queries

Without these signals, only generic queries like “low annual fee card” get generated, missing specific real-world needs like “card with good wedding venue payment benefits while preparing for a wedding.”

From Query Generation to Metadata Classification

Query generation (Step 3) determines only each query’s text and its assigned bucket. A separate metadata classification step (Step 3b) then tags supplementary information: intent, funnel, persona_hint, etc.

This two-stage separation exists for a reason. If generation and classification happen simultaneously, the LLM biases generation toward classification — thinking “this query should have recommendation intent, so I’ll phrase it as a recommendation.” Generating freely first and classifying afterward preserves query diversity.

Classified metadata:

FieldValue RangeDescription
intentdefinition, comparison, recommendation, condition, switching, situation, problem_solving, evaluationUser’s fundamental purpose
funnelawareness, consideration, decision, post_purchasePurchase journey stage
persona_hintFree textEstimated user segment
brand_relevancedirect, contextual, category, competitive, adjacentRelationship to brand
priorityhigh, medium, lowGEO measurement importance

Practical Example: Mapping a Fictional SaaS Product

To make the theory concrete, here is a mapping to all 9 buckets using a fictional project management SaaS called “TaskFlow.”

Basic information:

  • Brand: TaskFlow
  • Category: Project management tools
  • Main competitors: Monday.com, Asana, ClickUp
  • USP: AI-powered automatic task assignment, unlimited free plan

Per-Bucket Query Examples

BucketZoneQuery Example
A Branded DirectOwned”How many users does TaskFlow’s free plan support? What’s different from the paid plan?”
B Branded ContextualOwned”My team has 3 designers and 5 developers — would TaskFlow work for sprint management?”
C Category GenericBattleground”I’m adopting a project management tool for the first time — anything free with both kanban and Gantt charts?”
D Persona EntryBattleground”I’m a freelance designer — what tool is good for managing projects separately by client?”
E Persona PremiumBattleground”I’m a CTO at a 50-person startup migrating from JIRA — recommend a PM tool with API integration and SSO”
H Adjacent TopicBattleground”How do remote teams handle async communication well? Task tracking keeps falling through”
I Trend / RankingBattleground”What are the most popular project management tools among startups in 2026?”
F Head-to-HeadCompetitive”Between TaskFlow and Monday.com, which is better for small teams? Compare price and features”
G Competitor PainCompetitive”Monday.com’s per-seat pricing is too expensive — any similar tools with more reasonable pricing?”

What This Mapping Reveals

Several design principles are clearly visible:

  1. C, D, E, H, and I contain no mention of “TaskFlow.” The goal is to see whether AI recommends TaskFlow voluntarily.
  2. G also omits “TaskFlow.” The core question is whether AI mentions TaskFlow when a Monday.com user seeks alternatives.
  3. The difference between D and E is clear. D targets a “freelance designer” (entry level); E targets a “50-person CTO” (high-value user).
  4. H is not a direct category query. “Remote team async communication” does not directly ask about PM tools, but PM tools could be part of the answer — an adjacent opportunity.

Query Coverage Validation

After the query set is generated, it must be validated to ensure it adequately represents real user search behavior. No matter how well-designed the framework, low-quality queries produce unreliable analysis results.

Coverage Validation Dimensions

graph LR
    subgraph Coverage ["Query Coverage: 4 Validation Dimensions"]
        A[Topic Coverage<br/>Reflects real user interests]
        B[Persona Coverage<br/>Includes diverse user types]
        C[Complexity Distribution<br/>Balance of simple to complex]
        D[Expression Diversity<br/>Tone, length, patterns]
    end
    style Coverage fill:#1e293b,color:#fff
DimensionValidation QuestionFailure Signal
Topic coverageAre major topics from autocomplete reflected in queries?Popular topics absent from any query
Persona coverageAre derived personas represented in D and E buckets?All queries assume the same user type
Complexity distributionIs the simple/medium/complex ratio reasonable?Simple queries exceed 50%
Expression diversityAre sentence patterns, tone, and length varied?Every query ends with “recommend me…”

Query Quality Anti-Patterns

Problems discovered during actual operation:

Anti-PatternExampleWhy It Is a Problem
Keyword-style query”project management tool recommendation”Real AI search users do not query this tersely. A legacy habit from search engines.
Robotic tone”Please recommend alternatives for service dissatisfaction”Not how real people talk. Nobody queries an AI chatbot this way.
Single condition”cheap annual fee card”Real users present multiple conditions simultaneously — annual fee + international fees + rewards rate.
Analyst jargon”Recommend a GEO-optimized card”This is an analyst’s query, not a user’s.
Bucket rule violationBattleground query containing the brand nameViolates the framework’s core principle. Measurement becomes meaningless.

The most frequently discovered quality issue is “keyword-style queries.” The more familiar someone is with SEO, the more prone they are to this pattern. AI search queries are conversations — this must be continuously reinforced.


How the Bucket Count Converged to 9

Reaching 9 buckets involved multiple iterations. This record is preserved for reference.

VersionBucket CountProblem
v15Battleground was a single bucket. Could not distinguish persona queries from trend queries
v27No Competitive Zone. Competitor queries mixed into Battleground, making scores ambiguous
v312Over-segmented. Each bucket had only 2-3 queries, reducing statistical reliability. Report interpretation became too complex
v4 (current)9Balance between interpretability and query design cost. Minimum 4 queries per bucket ensured

Two criteria drove convergence to 9.

First, interpretability. As bucket count increases, reports grow complex and answering the client’s question “So what should I do?” becomes harder. Nine fits a 3x3 matrix visualization.

Second, statistical minimum query count. Distributing 40 total queries across 9 buckets yields 4-6 queries per bucket. With each query replicated 3 times, that is at least 12 responses per bucket — sufficient for bucket-level pattern detection. At 12 buckets, some would drop to 2-3 queries, becoming vulnerable to noise.


Connection to the Metric System

The 9-Bucket framework does not stand alone. It integrates with WICHI’s metric system to produce per-bucket GEO Scores.

graph TD
    A[40 Queries<br/>9-Bucket Classification] --> B[AI Engine Response Collection<br/>3 reps x 3 engines per query]
    B --> C[LLM Judge Evaluation<br/>6 dimensions, 1-5 scale]
    C --> D[Metric Calculation<br/>Inclusion + Prominence + Quality]
    D --> E[Per-Bucket GEO Score<br/>Where is strong, where is weak]
    D --> F[Per-Zone GEO Score<br/>Owned vs Battleground vs Competitive]
    D --> G[Overall GEO Score<br/>Composite score]
    style E fill:#2563eb,color:#fff

Because each bucket produces its own GEO Score, the following interpretations become possible:

PatternInterpretationAction
High Owned + Low BattlegroundAI knows the brand but does not voluntarily recommend itContent optimization, expand external citations
Low OwnedAI has inaccurate basic brand informationCorrect brand information sources first
High Battleground C + Low DVisible in general exploration but weak for specific personasReinforce persona-targeted content
High Competitive GAppears well as an alternative when competitors face complaintsStrengthen conversion-focused content
High I + Low CShows up in trend/ranking lists but absent from specific recommendationsExpand product differentiation content

This per-bucket diagnosis is the 9-Bucket framework’s practical value. An overall GEO Score alone cannot provide insight beyond “the score is 60.” Per-bucket scores enable specific actions like “Battleground C is low, so we need to improve content to get AI to recommend us at the category level.”


Design Process Summary

Decision PointChoice MadeRationale
Classification axisIntent-based → Brand presenceBrand presence is a more fundamental distinction in AI search
Zone count2 → 3Competitor queries needed separate treatment
Bucket count5 → 7 → 12 → 9Balance of interpretability + statistical reliability
Core principleBrand Name ExclusionGEO’s essence = AI’s voluntary recommendation
Query formatKeywords → ConversationsReflects actual AI search user behavior
Query generationArbitrary → Signal-basedReflects real user interests
MetadataSimultaneous with generation → Post-hoc classificationPrevents classification from biasing generation
Share

Related Posts