Minbook
KO
GEO Paper Review: Optimization Approaches and Vertical Applications

GEO Paper Review: Optimization Approaches and Vertical Applications

MJ · · 14 min read

Review of AutoGEO (quality-preserving auto-optimization) and E-GEO (e-commerce vertical benchmark) papers, analyzing how GEO optimization seeks a Pareto optimal between visibility and utility.

Review Context

GEO research has been branching rapidly since Aggarwal et al. defined the concept at KDD 2024. While early work focused on proving “does GEO work?”, 2025 preprints shifted the questions to “how do we optimize?” and “which domains do we apply it to?” This transition signals that the GEO field is moving beyond proof of concept into the engineering phase.

Looking at SEO’s evolution as a reference, this branching was predictable. SEO also started from a single question — “how does a search engine rank crawled content?” — before splitting along domain and technique axes into technical SEO, on-page SEO, local SEO, e-commerce SEO, and more. GEO is following the same trajectory.

The two papers covered here represent two different axes of that branching.

PaperAuthorsCore QuestionApproach
AutoGEOWu et al. (2025)Can we automate optimization while preserving quality?General-purpose framework
E-GEOBagga et al. (2025)How do we measure GEO in e-commerce?Vertical-specific benchmark

Wu et al.’s AutoGEO tackles the challenge of general-purpose optimization automation with quality preservation, while Bagga et al.’s E-GEO addresses the challenge of building a benchmark for the e-commerce vertical. This review analyzes each paper in order, followed by cross-comparison and unresolved issues.

The Flow of GEO Optimization Research

Understanding where these two papers sit requires grasping the broader flow of GEO research. The table below organizes major GEO studies from 2024-2025 chronologically.

PeriodPaper/StudyCore ContributionResearch Stage
2024 Q2Aggarwal et al. (KDD)GEO concept definition, GEO-BenchDefinition
2025 Q1Chen et al.Empirical behavior analysis, PAWC metricMeasurement
2025 Q1Wu et al. (AutoGEO)Quality-preserving auto-optimizationOptimization
2025 Q1Bagga et al. (E-GEO)E-commerce-specific benchmarkVertical application
2025-2026Kim et al. (SAGEO Arena)Full pipeline evaluationEvaluation framework
2025-2026Jin et al. (CORE)Ranking manipulation risk demonstrationSecurity

AutoGEO and E-GEO address research questions that naturally follow the definition and measurement stages: “how do we execute?” and “where do we apply?” Once the definition stage established “what GEO is” and the measurement stage established “how to quantify GEO,” the remaining questions are about execution and application.


AutoGEO: Automated Optimization Without Sacrificing Quality

Wu et al.’s AutoGEO (2025 preprint) directly addresses a core concern in GEO research: Does GEO optimization degrade content quality? This question is intimately tied to patterns that have repeated throughout SEO history. Just as keyword stuffing and link farms — strategies that compromise content integrity to boost search rankings — polluted the SEO ecosystem, the same concern exists for GEO.

Problem Definition: The GEO Dilemma

The specific problem AutoGEO aims to solve is as follows.

Most existing GEO optimization attempts have been either manual or tend to compromise the content’s original purpose — delivering useful information to users — during the optimization process. For example, inserting unnecessary statistics to increase citation probability, or excessively adding authoritative citations to the point of disrupting the content’s natural flow.

Wu et al. decompose this problem along three axes:

AxisProblemLimitation of Existing Approaches
AutomationManual optimization cannot scaleRule-based automation lacks precision
Quality preservationOptimization degrades usefulnessAssumes a visibility-quality tradeoff
GeneralizationOnly effective for specific engines/queriesPerformance drops on domain transfer

AutoGEO sets the goal of solving all three simultaneously. Its core claim is that visibility and quality are not zero-sum but can exist in a cooperative relationship.

AutoGEO Architecture

AutoGEO’s pipeline consists of three phases, each designed as an independent module.

flowchart TD
    A[Input content] --> B[Phase 1: Automatic preference rule extraction]
    B --> C[Rule set R]
    C --> D[Phase 2: Rule-based rewriting]
    D --> E[Rewritten content]
    E --> F[Phase 3: Quality preservation verification]
    F -->|Pass| G[Optimized content]
    F -->|Fail| H[Rule adjustment]
    H --> D

    style B fill:#e8f4fd,stroke:#333
    style D fill:#e8f4fd,stroke:#333
    style F fill:#fde8e8,stroke:#333

Phase 1: Preference Rule Extraction

In this phase, AutoGEO automatically analyzes which content characteristics generative engines prefer. Specifically, the process involves:

  1. Collecting generative engine responses for various queries
  2. Comparing cited sources against uncited sources
  3. Extracting content characteristics commonly found in cited sources — structural elements, information density, writing style, citation patterns — as rules

This process is similar to reverse engineering a search engine’s ranking factors in traditional SEO, but differs methodologically because the target is not a traditional search algorithm but LLM behavioral patterns. Since LLMs operate on probabilistic patterns based on training data and prompts rather than explicit ranking algorithms, rule extraction uses the LLM itself as an analytical tool rather than statistical analysis.

Phase 2: Rule-Based Rewriting

The extracted rule set is applied to existing content for automatic rewriting. The key to this phase is the precision of rule application. Rather than applying all rules uniformly, appropriate rule subsets are selectively applied based on the content’s domain and type.

Phase 3: Quality Preservation Verification

This phase evaluates whether the rewritten content maintains the original’s utility. Wu et al. measure utility across four dimensions:

Utility DimensionDefinitionMeasurement Method
CompletenessAre the core information elements from the original preserved?Retention rate of key information elements
AccuracyHave factual errors been introduced during rewriting?Fact-check-based verification
ReadabilityIs the rewritten content’s readability maintained or improved?Readability metrics
NaturalnessIs the writing style natural, without mechanical optimization artifacts?Human evaluation + automated evaluation

Content that fails the verification phase is fed back to Phase 2 with rule adjustments. This iterative loop implements AutoGEO’s “cooperative” character. Rather than simply maximizing visibility, the system searches for the Pareto optimal point between visibility and quality.

Multi-Agent Cooperative Framework

The technical core of AutoGEO lies in its multi-agent cooperative framework. Agents responsible for each phase operate independently while collaborating under a shared objective function — visibility improvement + quality maintenance.

flowchart LR
    subgraph Analyzer["Analyzer Agent"]
        A1[Query analysis]
        A2[Engine response collection]
        A3[Rule extraction]
    end

    subgraph Optimizer["Optimizer Agent"]
        O1[Rule selection]
        O2[Content rewriting]
        O3[Change tracking]
    end

    subgraph Validator["Validator Agent"]
        V1[Utility evaluation]
        V2[Quality score calculation]
        V3[Pass/reject decision]
    end

    Analyzer -->|Rule set| Optimizer
    Optimizer -->|Rewriting results| Validator
    Validator -->|Feedback| Optimizer
    Validator -->|Rule adjustment request| Analyzer

A notable aspect of this structure is that the Validator agent does more than simply pass or reject — it analyzes the failure cause and provides feedback to both the Optimizer and Analyzer agents. This enables iterative improvement of the entire system.

This design aligns with the multi-agent patterns actively being researched in LLM-based systems. Research showing that multiple agents with separated roles collaborating on complex tasks outperform a single LLM handling everything shares the same context.

Experimental Design

Wu et al.’s experimental design is summarized below.

Experimental ElementDetails
Target enginesMultiple generative search engines (see paper for specific engine names)
Query setInformation-seeking queries across diverse domains
Comparison groupsOriginal content (baseline), naive optimization, AutoGEO
Evaluation metricsGEO visibility metrics + content utility metrics (dual evaluation)
Key resultsAverage GEO metric +35.99%, utility metrics no degradation

What’s particularly notable in the comparison group design is the inclusion of “naive optimization” as a separate comparison group. This ensures the conclusion is not the trivial “optimization improves performance,” but rather “AutoGEO’s cooperative approach is superior to naive optimization in terms of quality preservation.”

Key Result Analysis

AutoGEO achieved an average 35.99% improvement on GEO visibility metrics, and this improvement was achieved without statistically significant degradation in content utility.

This result can be interpreted on two levels.

First, the quantitative significance. 35.99% is meaningful as an absolute number, but what matters more is that it was achieved without quality degradation. Compared to naive optimization, AutoGEO shows similar visibility improvement magnitudes but significantly less utility degradation. This suggests the cooperative framework’s feedback loop is actually working.

Second, the structural significance. It disproves the implicit assumption that “visibility and quality are in a tradeoff relationship.” This is a question about the legitimacy of the entire GEO field. If GEO optimization inevitably degrades content quality, GEO becomes a technology that harms user experience. AutoGEO’s results provide a counterargument to this concern.

However, several caveats apply when interpreting these results:

  • 35.99% is an “average.” Variance by domain and query type likely exists.
  • “No utility degradation” is measured against criteria that Wu et al. themselves established. External validation is needed.
  • The results are tied to the generative engine version at the time of experimentation and require separate verification for continued validity after engine updates.

Theoretical Implications of Cooperative GEO

The concept of “cooperative GEO” proposed by Wu et al. extends beyond a mere technical approach to carry implications as an ethical framework for the GEO field.

Throughout SEO’s history, Google repeatedly emphasized the principle of “create content for users,” but in practice, strategies that harmed user experience for higher search rankings were pervasive. The same pattern could easily repeat in GEO: inserting unnecessary elements to increase the probability of AI engine citations, or distorting content’s arguments — “GEO spam” — could emerge.

AutoGEO’s cooperative framework presents a technical solution to this problem. By embedding quality verification into the optimization process, quality-degrading optimization is blocked at the system level. If this approach spreads industry-wide, it could preemptively prevent the formation of a perception that “optimized content equals low-quality content.”


E-GEO: An E-Commerce-Specific Benchmark

Bagga et al.’s E-GEO (2025 preprint) takes a different direction from AutoGEO. In a landscape where most GEO research focuses on informational queries, E-GEO builds the first systematic benchmark for queries with commercial intent.

Why E-Commerce GEO Requires Separate Research

E-commerce queries are fundamentally different in nature from information-seeking queries. This difference is not merely “the query content differs” — it means the optimization target itself is different.

DimensionInformation-Seeking QueriesE-Commerce Queries
User intentUnderstanding, learningPurchase decision
Expected response formatExplanations, analysisComparisons, recommendations, specs
Distance to conversionIndirect (awareness → interest)Direct (comparison → purchase)
Trust basisExpertise, sourcesPrice, reviews, usage experience
Time sensitivityRelatively lowHigh (price fluctuations, stock)
Optimization goalIncrease citation probabilityCitation + purchase conversion

“Best wireless earbuds recommendation” and “fundamental principles of quantum mechanics” are both search queries, but the structure of the AI engine’s response, citation patterns, and the type of information users expect are completely different. A general-purpose GEO benchmark cannot capture this difference.

E-GEO Benchmark Composition

The scale and composition of the benchmark E-GEO built is as follows:

ItemDetails
Query scale7,000+ realistic product queries
Query sourceBased on actual e-commerce search logs
Rewriting strategies15 heuristic-based rewriting approaches
Optimization methodIterative Prompt Optimization (IPO)
Application targetsE-commerce product descriptions, reviews, comparison content
Evaluation enginesMultiple generative search engines

The 7,000+ query scale is considerably larger than existing GEO benchmarks. Considering that Aggarwal et al.’s GEO-Bench had query sets in the hundreds, E-GEO is a substantial effort to sufficiently reflect the diversity of the e-commerce domain in terms of scale.

Product Query Taxonomy

One of E-GEO’s key contributions is proposing a taxonomy that systematically classifies e-commerce queries. This classification empirically demonstrates that different optimization strategies are needed for each query type.

flowchart TD
    Q[E-commerce Query] --> C1[Product Discovery]
    Q --> C2[Product Comparison]
    Q --> C3[Purchase Decision]
    Q --> C4[Usage & Troubleshooting]

    C1 --> C1a["'Best wireless earbuds'"]
    C1 --> C1b["'Running shoes under $100'"]

    C2 --> C2a["'AirPods vs Galaxy Buds'"]
    C2 --> C2b["'Dyson V15 vs V12 differences'"]

    C3 --> C3a["'AirPods Pro 2 price'"]
    C3 --> C3b["'Galaxy S25 pre-order'"]

    C4 --> C4a["'AirPods one side not working'"]
    C4 --> C4b["'Dyson filter replacement cycle'"]

Each query type elicits different AI engine response patterns, and therefore requires different optimization strategies. E-GEO’s experimental results by query type are summarized below.

Query TypeAI Engine Response CharacteristicsEffective Optimization StrategyIneffective Strategy
Product DiscoveryList-format, category-basedStructured spec tables, category tagsSimple keyword insertion
Product ComparisonComparison tables, pros/cons analysisClear comparison framework, numerical dataOne-sided recommendations
Purchase DecisionPrice/stock info, purchase linksPrice history, discount info, buying guideGeneric product descriptions
Usage/TroubleshootingStep-by-step guides, FAQStructured resolution steps, visual guidesLong narrative explanations

15 Rewriting Heuristics

The 15 rewriting heuristics designed by E-GEO are optimization strategies specialized for e-commerce content. These strategies differ in character from the general-purpose GEO strategies applied to academic or news content. Below is a category-based classification:

CategoryStrategy ExamplesApplicable Types
Structural optimizationSpec table insertion, comparison matrix addition, FAQ structuringAll types
Data enrichmentPrice info addition, user review summary insertion, benchmark figuresDiscovery, comparison
Trust signalsExpert citations, verified data source attribution, test resultsPurchase decision
Format optimizationPros/cons lists, rating summaries, explicit recommendation reasoningComparison, discovery
Intent matchingBuying guide tone, problem-solving step sequencing, usage scenariosPurchase/usage

The effectiveness of these 15 strategies was not uniform. The gap between the most and least effective strategies in E-GEO’s experiments was large, meaning an approach of “just apply any strategy to e-commerce content” is not viable.

Iterative Prompt Optimization (IPO)

Another methodological contribution from E-GEO is Iterative Prompt Optimization (IPO). Rather than performing optimization with a single prompt, this approach gradually improves prompts through multiple iterations.

E-GEO’s iterative prompt optimization showed significant improvement in visibility metrics with multiple iterations compared to a single attempt. The iteration effect was most pronounced for comparison-type queries.

This methodology contrasts with AutoGEO’s rule-based approach. Where AutoGEO extracts explicit rules and applies them, E-GEO’s IPO explores optimal results through iterative prompt adjustments without explicitly defining rules.

E-Commerce-Specific Findings

Here are the findings unique to the e-commerce domain from E-GEO.

General-purpose GEO strategies achieve only about 40-60% effectiveness on e-commerce queries. Visibility improvement becomes significantly higher when e-commerce-specific strategies are applied.

This finding directly contradicts the assumption that “GEO strategies are universally applicable.” When the domain changes, the optimization strategy must change too.

Specifically, the elements effective at driving AI engine citations in e-commerce content are:

  1. Price comparison data: Citation probability increases when content includes tables quantitatively comparing prices across competing products.
  2. Spec tables: Presenting key specifications in structured tables increases the probability of AI engines citing that source when generating comparison responses.
  3. Real-use review summaries: Content summarizing real usage experiences by category is preferred over simple star ratings.
  4. Purchase decision trees: Conditional recommendation structures like “if your budget is under $X, choose A; if above, choose B” are favorable for citation.

Conversely, strategies effective for academic content — such as authoritative citations and statistical data density — showed relatively lower effectiveness in e-commerce contexts.


Cross-Analysis: General-Purpose vs Vertical-Specific

Placing the two papers side by side reveals the two clear directions in which GEO research is branching.

Comparison Framework

DimensionAutoGEOE-GEO
Core questionCan we optimize while maintaining quality?Can we build a benchmark for a specific domain?
Approach directionGeneral-purposeVertical-specific
MethodologyRule extraction + auto-rewriting + quality verificationHeuristic design + iterative prompt optimization
Agent architectureMulti-agent cooperativeSingle optimization loop
Query type focusInformation-seeking queriesCommercial-intent queries
ScaleDiverse domain query sets7,000+ e-commerce queries
Evaluation criteriaGEO metrics + utility dual measurementE-commerce visibility metrics
Automation levelHigh (from rule extraction to rewriting)Medium (heuristics manual, application automatic)
Practical implicationsJustification for content teams adopting GEOStrategy basis for e-commerce operators in AI search

Convergence Points

Though superficially different, the two approaches converge at several points.

First, the importance of structured content. In both AutoGEO’s preference rule extraction results and E-GEO’s rewriting heuristics, structured content (tables, lists, step-by-step guides) is commonly found to have higher AI engine citation probability than unstructured narrative content. This aligns with the technical characteristic that LLMs encode structured information more effectively from their training data and find it easier to reference structured sources during response generation.

Second, the need for content-type differentiation. Whether it’s AutoGEO selectively applying rules based on content domain or E-GEO measuring different optimization strategy effects by query type, both arrive at the conclusion that “a single GEO strategy does not work for all content.”

Third, the validity of iterative optimization. Both AutoGEO’s feedback loops and E-GEO’s iterative prompt optimization share the conclusion that multiple iterations produce better results than a single optimization pass.

Combination Potential

flowchart TD
    subgraph Combined["Combined Framework (Hypothesis)"]
        A[AutoGEO's rule extraction] --> B[Domain-specific rule filtering]
        B --> C[Merge with E-GEO's e-commerce heuristics]
        C --> D[Integrated rewriting]
        D --> E[AutoGEO's quality verification]
        E -->|Pass| F[Optimization complete]
        E -->|Fail| G[Adjust via E-GEO's IPO]
        G --> D
    end

The two approaches are not mutually exclusive. In fact, they could become more powerful when combined. Applying AutoGEO’s cooperative GEO framework (rule extraction + quality verification) to E-GEO’s e-commerce domain is the natural next step. Specifically:

  1. Use e-commerce domain queries as training data in AutoGEO’s rule extraction phase
  2. Compare extracted rules with E-GEO’s 15 heuristics to strengthen domain-specific rules
  3. Add e-commerce-specific utility metrics (price accuracy, spec completeness, etc.) to AutoGEO’s quality verification phase
  4. Integrate E-GEO’s iterative prompt optimization into AutoGEO’s feedback loop

A model that performs commerce-specific optimization while maintaining quality would be the most practically useful.


Unresolved Issues (Gap Analysis)

Both papers share the structural limitations of GEO research at this stage. These limitations are not weaknesses of individual papers but rather indicators that the GEO field is still in its early phase.

Limitation 1: Generative Engine Volatility

Both papers experiment based on AI engine responses at a specific point in time. However, LLM-based search engines undergo frequent model updates, prompt changes, and ranking logic modifications. An optimization strategy effective today could be invalidated by the next model update.

This problem existed in SEO too, but to a different degree. Google’s search algorithm updates (Panda, Penguin, BERT, etc.) occurred on an annual basis, with continuity in ranking logic between updates. The volatility of LLM-based engines is far greater. The model architecture itself can change, and even the same model can produce very different response patterns depending on prompt engineering.

Limitation 2: No Standardized Evaluation Metrics

Whether AutoGEO and E-GEO use the same GEO metric standards, and whether cross-comparison is possible, remains unclear. A unified benchmark across all GEO research does not yet exist.

PaperMetrics UsedCross-Comparability
Aggarwal et al. (2024)GEO-Bench proprietary metricsServes as reference point (de facto standard)
Chen et al. (2025)PAWC metricPartially compatible
Wu et al. (AutoGEO)GEO visibility + utilityAggarwal-based but extended
Bagga et al. (E-GEO)E-commerce visibilityIndependent metrics, compatibility unclear

This situation is analogous to the NLP field before GLUE/SuperGLUE emerged. When each study uses its own benchmark and metrics, cross-study comparison becomes impossible and measuring overall field progress becomes difficult.

Limitation 3: Missing Business KPI Linkage

Neither paper addresses how a 35.99% GEO metric improvement affects actual traffic, conversion rates, or revenue. The gap between academic benchmarks and business KPIs is an area that future research must fill.

In the causal chain from GEO visibility improvement to actual traffic inflow to conversion to revenue, current research only covers the first step.

This missing linkage is particularly pronounced in E-GEO’s e-commerce benchmark. In e-commerce, GEO’s value must ultimately be measured by revenue contribution, but E-GEO stops at visibility metrics.

Limitation 4: No Multilingual/Multicultural Validation

Both papers conduct English-centric experiments. Whether GEO optimization strategies work identically in non-English languages like Korean, Japanese, and Chinese has not been verified. Since LLM training data distributions differ by language and content consumption patterns differ by culture, direct transfer of English-language results is risky.

Limitation 5: No Multimodal Content Consideration

Current GEO research is focused on text content. However, in e-commerce, the share of multimodal content — product images, video reviews, infographics — is substantial. If multimodal AI search becomes widespread, text-based GEO strategies alone will be insufficient.

Future Research Directions

Synthesizing the above limitations, the directions that future GEO optimization research must address are:

Research DirectionNecessityDifficulty
Unified benchmark constructionEnable cross-study comparabilityHigh (requires community consensus)
Business KPI linkageProve GEO’s practical valueHigh (requires A/B testing)
Engine volatility responseSustainable optimization strategiesMedium (monitoring systems)
Multilingual expansionValidate non-English applicabilityMedium (dataset construction)
Vertical expansionDomains beyond e-commerceMedium (requires domain expertise)
Multimodal GEOOptimization including images/videoHigh (new methodology needed)

Practitioner-Oriented Implications

Here are the practically actionable takeaways from both papers.

GEO Optimization Does Not Presuppose Quality Degradation

AutoGEO’s cooperative GEO results suggest it is possible to move beyond the “optimization vs. quality” dichotomy. There is academic basis to dispel the concern that “optimizing will make content worse” when content teams adopt GEO. However, this does not mean “any kind of optimization preserves quality.” This conclusion is only valid under a systematic framework with embedded quality verification.

Domain-Specific GEO Strategies Are Necessary

What E-GEO demonstrates is the limitation of general-purpose GEO strategies. Strategies effective in e-commerce differ from those effective for academic content. This implies that benchmarks and strategies specialized for each vertical — e-commerce, healthcare, finance, travel — are needed.

Structure Is Key

Both papers converge on the conclusion that structured content (tables, lists, step-by-step guides) is more favorable for AI engine citations than unstructured narrative. This is an immediately actionable strategy. Simply adding structural elements to existing content can improve AI search visibility.

Understand the Realistic Level of Automation

AutoGEO’s automatic rewriting is rule-based, not fully autonomous. E-GEO’s heuristics are also human-designed. At this point, GEO optimization is most realistic as a tool-assisted approach. The expectation that “AI will handle the optimization automatically” is premature.

Monitoring Systems Are Essential

Given the volatility of generative engines, a single round of optimization is not the end — continuous monitoring and re-optimization are necessary. This is the same context in which rank monitoring tools are essential in SEO. GEO also needs AI engine response monitoring systems, and this area is still lacking in tooling.


References

  • Wu, Z. et al. (2025). AutoGEO: Automating Generative Engine Optimization with Cooperative Content Rewriting. Preprint.
  • Bagga, N. et al. (2025). E-GEO: A Testbed for Generative Engine Optimization in E-Commerce. Preprint.
  • Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. Proceedings of KDD 2024.
  • Chen, J. et al. (2025). Generative Engine Optimization: How to Dominate AI Search. Preprint.
Share

Related Posts