GEO Paper Review: Optimization Approaches and Vertical Applications

Review Context

GEO research has been branching rapidly since Aggarwal et al. defined the concept at KDD 2024. While early work focused on proving “does GEO work?”, 2025 preprints shifted the questions to “how do we optimize?” and “which domains do we apply it to?” This transition signals that the GEO field is moving beyond proof of concept into the engineering phase.

Looking at SEO’s evolution as a reference, this branching was predictable. SEO also started from a single question — “how does a search engine rank crawled content?” — before splitting along domain and technique axes into technical SEO, on-page SEO, local SEO, e-commerce SEO, and more. GEO is following the same trajectory.

The two papers covered here represent two different axes of that branching.

Paper	Authors	Core Question	Approach
AutoGEO	Wu et al. (2025)	Can we automate optimization while preserving quality?	General-purpose framework
E-GEO	Bagga et al. (2025)	How do we measure GEO in e-commerce?	Vertical-specific benchmark

Wu et al.’s AutoGEO tackles the challenge of general-purpose optimization automation with quality preservation, while Bagga et al.’s E-GEO addresses the challenge of building a benchmark for the e-commerce vertical. This review analyzes each paper in order, followed by cross-comparison and unresolved issues.

The Flow of GEO Optimization Research

Understanding where these two papers sit requires grasping the broader flow of GEO research. The table below organizes major GEO studies from 2024-2025 chronologically.

Period	Paper/Study	Core Contribution	Research Stage
2024 Q2	Aggarwal et al. (KDD)	GEO concept definition, GEO-Bench	Definition
2025 Q1	Chen et al.	Empirical behavior analysis, PAWC metric	Measurement
2025 Q1	Wu et al. (AutoGEO)	Quality-preserving auto-optimization	Optimization
2025 Q1	Bagga et al. (E-GEO)	E-commerce-specific benchmark	Vertical application
2025-2026	Kim et al. (SAGEO Arena)	Full pipeline evaluation	Evaluation framework
2025-2026	Jin et al. (CORE)	Ranking manipulation risk demonstration	Security

AutoGEO and E-GEO address research questions that naturally follow the definition and measurement stages: “how do we execute?” and “where do we apply?” Once the definition stage established “what GEO is” and the measurement stage established “how to quantify GEO,” the remaining questions are about execution and application.

AutoGEO: Automated Optimization Without Sacrificing Quality

Wu et al.’s AutoGEO (2025 preprint) directly addresses a core concern in GEO research: Does GEO optimization degrade content quality? This question is intimately tied to patterns that have repeated throughout SEO history. Just as keyword stuffing and link farms — strategies that compromise content integrity to boost search rankings — polluted the SEO ecosystem, the same concern exists for GEO.

Problem Definition: The GEO Dilemma

The specific problem AutoGEO aims to solve is as follows.

Most existing GEO optimization attempts have been either manual or tend to compromise the content’s original purpose — delivering useful information to users — during the optimization process. For example, inserting unnecessary statistics to increase citation probability, or excessively adding authoritative citations to the point of disrupting the content’s natural flow.

Wu et al. decompose this problem along three axes:

Axis	Problem	Limitation of Existing Approaches
Automation	Manual optimization cannot scale	Rule-based automation lacks precision
Quality preservation	Optimization degrades usefulness	Assumes a visibility-quality tradeoff
Generalization	Only effective for specific engines/queries	Performance drops on domain transfer

AutoGEO sets the goal of solving all three simultaneously. Its core claim is that visibility and quality are not zero-sum but can exist in a cooperative relationship.

AutoGEO Architecture

AutoGEO’s pipeline consists of three phases, each designed as an independent module.

flowchart TD
    A[Input content] --> B[Phase 1: Automatic preference rule extraction]
    B --> C[Rule set R]
    C --> D[Phase 2: Rule-based rewriting]
    D --> E[Rewritten content]
    E --> F[Phase 3: Quality preservation verification]
    F -->|Pass| G[Optimized content]
    F -->|Fail| H[Rule adjustment]
    H --> D

    style B fill:#e8f4fd,stroke:#333
    style D fill:#e8f4fd,stroke:#333
    style F fill:#fde8e8,stroke:#333

Phase 1: Preference Rule Extraction

In this phase, AutoGEO automatically analyzes which content characteristics generative engines prefer. Specifically, the process involves:

Collecting generative engine responses for various queries
Comparing cited sources against uncited sources
Extracting content characteristics commonly found in cited sources — structural elements, information density, writing style, citation patterns — as rules

This process is similar to reverse engineering a search engine’s ranking factors in traditional SEO, but differs methodologically because the target is not a traditional search algorithm but LLM behavioral patterns. Since LLMs operate on probabilistic patterns based on training data and prompts rather than explicit ranking algorithms, rule extraction uses the LLM itself as an analytical tool rather than statistical analysis.

Phase 2: Rule-Based Rewriting

The extracted rule set is applied to existing content for automatic rewriting. The key to this phase is the precision of rule application. Rather than applying all rules uniformly, appropriate rule subsets are selectively applied based on the content’s domain and type.

Phase 3: Quality Preservation Verification

This phase evaluates whether the rewritten content maintains the original’s utility. Wu et al. measure utility across four dimensions:

Utility Dimension	Definition	Measurement Method
Completeness	Are the core information elements from the original preserved?	Retention rate of key information elements
Accuracy	Have factual errors been introduced during rewriting?	Fact-check-based verification
Readability	Is the rewritten content’s readability maintained or improved?	Readability metrics
Naturalness	Is the writing style natural, without mechanical optimization artifacts?	Human evaluation + automated evaluation

Content that fails the verification phase is fed back to Phase 2 with rule adjustments. This iterative loop implements AutoGEO’s “cooperative” character. Rather than simply maximizing visibility, the system searches for the Pareto optimal point between visibility and quality.

Multi-Agent Cooperative Framework

The technical core of AutoGEO lies in its multi-agent cooperative framework. Agents responsible for each phase operate independently while collaborating under a shared objective function — visibility improvement + quality maintenance.

flowchart LR
    subgraph Analyzer["Analyzer Agent"]
        A1[Query analysis]
        A2[Engine response collection]
        A3[Rule extraction]
    end

    subgraph Optimizer["Optimizer Agent"]
        O1[Rule selection]
        O2[Content rewriting]
        O3[Change tracking]
    end

    subgraph Validator["Validator Agent"]
        V1[Utility evaluation]
        V2[Quality score calculation]
        V3[Pass/reject decision]
    end

    Analyzer -->|Rule set| Optimizer
    Optimizer -->|Rewriting results| Validator
    Validator -->|Feedback| Optimizer
    Validator -->|Rule adjustment request| Analyzer

A notable aspect of this structure is that the Validator agent does more than simply pass or reject — it analyzes the failure cause and provides feedback to both the Optimizer and Analyzer agents. This enables iterative improvement of the entire system.

This design aligns with the multi-agent patterns actively being researched in LLM-based systems. Research showing that multiple agents with separated roles collaborating on complex tasks outperform a single LLM handling everything shares the same context.

Experimental Design

Wu et al.’s experimental design is summarized below.

Experimental Element	Details
Target engines	Multiple generative search engines (see paper for specific engine names)
Query set	Information-seeking queries across diverse domains
Comparison groups	Original content (baseline), naive optimization, AutoGEO
Evaluation metrics	GEO visibility metrics + content utility metrics (dual evaluation)
Key results	Average GEO metric +35.99%, utility metrics no degradation

What’s particularly notable in the comparison group design is the inclusion of “naive optimization” as a separate comparison group. This ensures the conclusion is not the trivial “optimization improves performance,” but rather “AutoGEO’s cooperative approach is superior to naive optimization in terms of quality preservation.”

Key Result Analysis

AutoGEO achieved an average 35.99% improvement on GEO visibility metrics, and this improvement was achieved without statistically significant degradation in content utility.

This result can be interpreted on two levels.

First, the quantitative significance. 35.99% is meaningful as an absolute number, but what matters more is that it was achieved without quality degradation. Compared to naive optimization, AutoGEO shows similar visibility improvement magnitudes but significantly less utility degradation. This suggests the cooperative framework’s feedback loop is actually working.

Second, the structural significance. It disproves the implicit assumption that “visibility and quality are in a tradeoff relationship.” This is a question about the legitimacy of the entire GEO field. If GEO optimization inevitably degrades content quality, GEO becomes a technology that harms user experience. AutoGEO’s results provide a counterargument to this concern.

However, several caveats apply when interpreting these results:

35.99% is an “average.” Variance by domain and query type likely exists.
“No utility degradation” is measured against criteria that Wu et al. themselves established. External validation is needed.
The results are tied to the generative engine version at the time of experimentation and require separate verification for continued validity after engine updates.

Theoretical Implications of Cooperative GEO

The concept of “cooperative GEO” proposed by Wu et al. extends beyond a mere technical approach to carry implications as an ethical framework for the GEO field.

Throughout SEO’s history, Google repeatedly emphasized the principle of “create content for users,” but in practice, strategies that harmed user experience for higher search rankings were pervasive. The same pattern could easily repeat in GEO: inserting unnecessary elements to increase the probability of AI engine citations, or distorting content’s arguments — “GEO spam” — could emerge.

AutoGEO’s cooperative framework presents a technical solution to this problem. By embedding quality verification into the optimization process, quality-degrading optimization is blocked at the system level. If this approach spreads industry-wide, it could preemptively prevent the formation of a perception that “optimized content equals low-quality content.”

E-GEO: An E-Commerce-Specific Benchmark

Bagga et al.’s E-GEO (2025 preprint) takes a different direction from AutoGEO. In a landscape where most GEO research focuses on informational queries, E-GEO builds the first systematic benchmark for queries with commercial intent.

Why E-Commerce GEO Requires Separate Research

E-commerce queries are fundamentally different in nature from information-seeking queries. This difference is not merely “the query content differs” — it means the optimization target itself is different.

Dimension	Information-Seeking Queries	E-Commerce Queries
User intent	Understanding, learning	Purchase decision
Expected response format	Explanations, analysis	Comparisons, recommendations, specs
Distance to conversion	Indirect (awareness → interest)	Direct (comparison → purchase)
Trust basis	Expertise, sources	Price, reviews, usage experience
Time sensitivity	Relatively low	High (price fluctuations, stock)
Optimization goal	Increase citation probability	Citation + purchase conversion

“Best wireless earbuds recommendation” and “fundamental principles of quantum mechanics” are both search queries, but the structure of the AI engine’s response, citation patterns, and the type of information users expect are completely different. A general-purpose GEO benchmark cannot capture this difference.

E-GEO Benchmark Composition

The scale and composition of the benchmark E-GEO built is as follows:

Item	Details
Query scale	7,000+ realistic product queries
Query source	Based on actual e-commerce search logs
Rewriting strategies	15 heuristic-based rewriting approaches
Optimization method	Iterative Prompt Optimization (IPO)
Application targets	E-commerce product descriptions, reviews, comparison content
Evaluation engines	Multiple generative search engines

The 7,000+ query scale is considerably larger than existing GEO benchmarks. Considering that Aggarwal et al.’s GEO-Bench had query sets in the hundreds, E-GEO is a substantial effort to sufficiently reflect the diversity of the e-commerce domain in terms of scale.

Product Query Taxonomy

One of E-GEO’s key contributions is proposing a taxonomy that systematically classifies e-commerce queries. This classification empirically demonstrates that different optimization strategies are needed for each query type.

flowchart TD
    Q[E-commerce Query] --> C1[Product Discovery]
    Q --> C2[Product Comparison]
    Q --> C3[Purchase Decision]
    Q --> C4[Usage & Troubleshooting]

    C1 --> C1a["'Best wireless earbuds'"]
    C1 --> C1b["'Running shoes under $100'"]

    C2 --> C2a["'AirPods vs Galaxy Buds'"]
    C2 --> C2b["'Dyson V15 vs V12 differences'"]

    C3 --> C3a["'AirPods Pro 2 price'"]
    C3 --> C3b["'Galaxy S25 pre-order'"]

    C4 --> C4a["'AirPods one side not working'"]
    C4 --> C4b["'Dyson filter replacement cycle'"]

Each query type elicits different AI engine response patterns, and therefore requires different optimization strategies. E-GEO’s experimental results by query type are summarized below.

Query Type	AI Engine Response Characteristics	Effective Optimization Strategy	Ineffective Strategy
Product Discovery	List-format, category-based	Structured spec tables, category tags	Simple keyword insertion
Product Comparison	Comparison tables, pros/cons analysis	Clear comparison framework, numerical data	One-sided recommendations
Purchase Decision	Price/stock info, purchase links	Price history, discount info, buying guide	Generic product descriptions
Usage/Troubleshooting	Step-by-step guides, FAQ	Structured resolution steps, visual guides	Long narrative explanations

15 Rewriting Heuristics

The 15 rewriting heuristics designed by E-GEO are optimization strategies specialized for e-commerce content. These strategies differ in character from the general-purpose GEO strategies applied to academic or news content. Below is a category-based classification:

Category	Strategy Examples	Applicable Types
Structural optimization	Spec table insertion, comparison matrix addition, FAQ structuring	All types
Data enrichment	Price info addition, user review summary insertion, benchmark figures	Discovery, comparison
Trust signals	Expert citations, verified data source attribution, test results	Purchase decision
Format optimization	Pros/cons lists, rating summaries, explicit recommendation reasoning	Comparison, discovery
Intent matching	Buying guide tone, problem-solving step sequencing, usage scenarios	Purchase/usage

The effectiveness of these 15 strategies was not uniform. The gap between the most and least effective strategies in E-GEO’s experiments was large, meaning an approach of “just apply any strategy to e-commerce content” is not viable.

Iterative Prompt Optimization (IPO)

Another methodological contribution from E-GEO is Iterative Prompt Optimization (IPO). Rather than performing optimization with a single prompt, this approach gradually improves prompts through multiple iterations.

E-GEO’s iterative prompt optimization showed significant improvement in visibility metrics with multiple iterations compared to a single attempt. The iteration effect was most pronounced for comparison-type queries.

This methodology contrasts with AutoGEO’s rule-based approach. Where AutoGEO extracts explicit rules and applies them, E-GEO’s IPO explores optimal results through iterative prompt adjustments without explicitly defining rules.

E-Commerce-Specific Findings

Here are the findings unique to the e-commerce domain from E-GEO.

General-purpose GEO strategies achieve only about 40-60% effectiveness on e-commerce queries. Visibility improvement becomes significantly higher when e-commerce-specific strategies are applied.

This finding directly contradicts the assumption that “GEO strategies are universally applicable.” When the domain changes, the optimization strategy must change too.

Specifically, the elements effective at driving AI engine citations in e-commerce content are:

Price comparison data: Citation probability increases when content includes tables quantitatively comparing prices across competing products.
Spec tables: Presenting key specifications in structured tables increases the probability of AI engines citing that source when generating comparison responses.
Real-use review summaries: Content summarizing real usage experiences by category is preferred over simple star ratings.
Purchase decision trees: Conditional recommendation structures like “if your budget is under $X, choose A; if above, choose B” are favorable for citation.

Conversely, strategies effective for academic content — such as authoritative citations and statistical data density — showed relatively lower effectiveness in e-commerce contexts.

Cross-Analysis: General-Purpose vs Vertical-Specific

Placing the two papers side by side reveals the two clear directions in which GEO research is branching.

Comparison Framework

Dimension	AutoGEO	E-GEO
Core question	Can we optimize while maintaining quality?	Can we build a benchmark for a specific domain?
Approach direction	General-purpose	Vertical-specific
Methodology	Rule extraction + auto-rewriting + quality verification	Heuristic design + iterative prompt optimization
Agent architecture	Multi-agent cooperative	Single optimization loop
Query type focus	Information-seeking queries	Commercial-intent queries
Scale	Diverse domain query sets	7,000+ e-commerce queries
Evaluation criteria	GEO metrics + utility dual measurement	E-commerce visibility metrics
Automation level	High (from rule extraction to rewriting)	Medium (heuristics manual, application automatic)
Practical implications	Justification for content teams adopting GEO	Strategy basis for e-commerce operators in AI search

Convergence Points

Though superficially different, the two approaches converge at several points.

First, the importance of structured content. In both AutoGEO’s preference rule extraction results and E-GEO’s rewriting heuristics, structured content (tables, lists, step-by-step guides) is commonly found to have higher AI engine citation probability than unstructured narrative content. This aligns with the technical characteristic that LLMs encode structured information more effectively from their training data and find it easier to reference structured sources during response generation.

Second, the need for content-type differentiation. Whether it’s AutoGEO selectively applying rules based on content domain or E-GEO measuring different optimization strategy effects by query type, both arrive at the conclusion that “a single GEO strategy does not work for all content.”

Third, the validity of iterative optimization. Both AutoGEO’s feedback loops and E-GEO’s iterative prompt optimization share the conclusion that multiple iterations produce better results than a single optimization pass.

Combination Potential

flowchart TD
    subgraph Combined["Combined Framework (Hypothesis)"]
        A[AutoGEO's rule extraction] --> B[Domain-specific rule filtering]
        B --> C[Merge with E-GEO's e-commerce heuristics]
        C --> D[Integrated rewriting]
        D --> E[AutoGEO's quality verification]
        E -->|Pass| F[Optimization complete]
        E -->|Fail| G[Adjust via E-GEO's IPO]
        G --> D
    end

The two approaches are not mutually exclusive. In fact, they could become more powerful when combined. Applying AutoGEO’s cooperative GEO framework (rule extraction + quality verification) to E-GEO’s e-commerce domain is the natural next step. Specifically:

Use e-commerce domain queries as training data in AutoGEO’s rule extraction phase
Compare extracted rules with E-GEO’s 15 heuristics to strengthen domain-specific rules
Add e-commerce-specific utility metrics (price accuracy, spec completeness, etc.) to AutoGEO’s quality verification phase
Integrate E-GEO’s iterative prompt optimization into AutoGEO’s feedback loop

A model that performs commerce-specific optimization while maintaining quality would be the most practically useful.

Unresolved Issues (Gap Analysis)

Both papers share the structural limitations of GEO research at this stage. These limitations are not weaknesses of individual papers but rather indicators that the GEO field is still in its early phase.

Limitation 1: Generative Engine Volatility

Both papers experiment based on AI engine responses at a specific point in time. However, LLM-based search engines undergo frequent model updates, prompt changes, and ranking logic modifications. An optimization strategy effective today could be invalidated by the next model update.

This problem existed in SEO too, but to a different degree. Google’s search algorithm updates (Panda, Penguin, BERT, etc.) occurred on an annual basis, with continuity in ranking logic between updates. The volatility of LLM-based engines is far greater. The model architecture itself can change, and even the same model can produce very different response patterns depending on prompt engineering.

Limitation 2: No Standardized Evaluation Metrics

Whether AutoGEO and E-GEO use the same GEO metric standards, and whether cross-comparison is possible, remains unclear. A unified benchmark across all GEO research does not yet exist.

Paper	Metrics Used	Cross-Comparability
Aggarwal et al. (2024)	GEO-Bench proprietary metrics	Serves as reference point (de facto standard)
Chen et al. (2025)	PAWC metric	Partially compatible
Wu et al. (AutoGEO)	GEO visibility + utility	Aggarwal-based but extended
Bagga et al. (E-GEO)	E-commerce visibility	Independent metrics, compatibility unclear

This situation is analogous to the NLP field before GLUE/SuperGLUE emerged. When each study uses its own benchmark and metrics, cross-study comparison becomes impossible and measuring overall field progress becomes difficult.

Limitation 3: Missing Business KPI Linkage

Neither paper addresses how a 35.99% GEO metric improvement affects actual traffic, conversion rates, or revenue. The gap between academic benchmarks and business KPIs is an area that future research must fill.

In the causal chain from GEO visibility improvement to actual traffic inflow to conversion to revenue, current research only covers the first step.

This missing linkage is particularly pronounced in E-GEO’s e-commerce benchmark. In e-commerce, GEO’s value must ultimately be measured by revenue contribution, but E-GEO stops at visibility metrics.

Limitation 4: No Multilingual/Multicultural Validation

Both papers conduct English-centric experiments. Whether GEO optimization strategies work identically in non-English languages like Korean, Japanese, and Chinese has not been verified. Since LLM training data distributions differ by language and content consumption patterns differ by culture, direct transfer of English-language results is risky.

Limitation 5: No Multimodal Content Consideration

Current GEO research is focused on text content. However, in e-commerce, the share of multimodal content — product images, video reviews, infographics — is substantial. If multimodal AI search becomes widespread, text-based GEO strategies alone will be insufficient.

Future Research Directions

Synthesizing the above limitations, the directions that future GEO optimization research must address are:

Research Direction	Necessity	Difficulty
Unified benchmark construction	Enable cross-study comparability	High (requires community consensus)
Business KPI linkage	Prove GEO’s practical value	High (requires A/B testing)
Engine volatility response	Sustainable optimization strategies	Medium (monitoring systems)
Multilingual expansion	Validate non-English applicability	Medium (dataset construction)
Vertical expansion	Domains beyond e-commerce	Medium (requires domain expertise)
Multimodal GEO	Optimization including images/video	High (new methodology needed)

Practitioner-Oriented Implications

Here are the practically actionable takeaways from both papers.

GEO Optimization Does Not Presuppose Quality Degradation

AutoGEO’s cooperative GEO results suggest it is possible to move beyond the “optimization vs. quality” dichotomy. There is academic basis to dispel the concern that “optimizing will make content worse” when content teams adopt GEO. However, this does not mean “any kind of optimization preserves quality.” This conclusion is only valid under a systematic framework with embedded quality verification.

Domain-Specific GEO Strategies Are Necessary

What E-GEO demonstrates is the limitation of general-purpose GEO strategies. Strategies effective in e-commerce differ from those effective for academic content. This implies that benchmarks and strategies specialized for each vertical — e-commerce, healthcare, finance, travel — are needed.

Structure Is Key

Both papers converge on the conclusion that structured content (tables, lists, step-by-step guides) is more favorable for AI engine citations than unstructured narrative. This is an immediately actionable strategy. Simply adding structural elements to existing content can improve AI search visibility.

Understand the Realistic Level of Automation

AutoGEO’s automatic rewriting is rule-based, not fully autonomous. E-GEO’s heuristics are also human-designed. At this point, GEO optimization is most realistic as a tool-assisted approach. The expectation that “AI will handle the optimization automatically” is premature.

Monitoring Systems Are Essential

Given the volatility of generative engines, a single round of optimization is not the end — continuous monitoring and re-optimization are necessary. This is the same context in which rank monitoring tools are essential in SEO. GEO also needs AI engine response monitoring systems, and this area is still lacking in tooling.

References

Wu, Z. et al. (2025). AutoGEO: Automating Generative Engine Optimization with Cooperative Content Rewriting. Preprint.
Bagga, N. et al. (2025). E-GEO: A Testbed for Generative Engine Optimization in E-Commerce. Preprint.
Aggarwal, P. et al. (2024). GEO: Generative Engine Optimization. Proceedings of KDD 2024.
Chen, J. et al. (2025). Generative Engine Optimization: How to Dominate AI Search. Preprint.