Analysis of multi-engine architecture design principles that leverage response variance as signals, featuring parallel collection structures and scalability via the adapter pattern.
Why Multiple Engines
When designing a GEO (Generative Engine Optimization) monitoring system, the first decision is which AI search engines to analyze.
A single-engine approach is tempting. You only need one parser, maintenance burden for response format changes stays small, and API costs remain low. WICHI’s initial prototype targeted just one engine.
But when you send the same query to different AI search engines, the results diverge significantly. This is not a matter of surface-level phrasing; the differences are structural.
Why Engine Responses Differ
Each AI search engine uses different training data, different search indexes, different ranking algorithms, and different response generation strategies. These differences directly affect which brands get mentioned, in what order they appear, and which sources are cited.
| Differentiating Factor | Impact on Results |
|---|---|
| Training data composition and cutoff | Whether a brand’s latest information is reflected |
| Search index scope | Types and range of web sources referenced |
| Source preference | Weighting toward encyclopedic, community, official sources |
| Response generation strategy | List-based, comparative, narrative formats |
| Citation method | Inline citations, footnote lists, direct URL exposure |
For example, sending “best project management tools in 2026” to three AI search engines might yield: Engine A recommends Brand X first, citing the official site and review outlets. Engine B omits Brand X entirely and centers on Brand Y with community feedback citations. Engine C mentions both X and Y but in different order and context.
The Structural Limitation of Single-Engine Analysis
Reporting single-engine results as “brand visibility in AI search” is misleading — it represents visibility on that particular engine, not across the AI search ecosystem as a whole. Since you cannot control which engine end users choose, and market share in AI search is shifting rapidly, single-engine analysis is fundamentally incomplete.
graph TD
Q[Same Search Query] --> E1[Engine A]
Q --> E2[Engine B]
Q --> E3[Engine C]
E1 --> R1["Brand X: Ranked #1<br/>Brand Y: Not mentioned<br/>Brand Z: Ranked #3"]
E2 --> R2["Brand X: Not mentioned<br/>Brand Y: Ranked #1<br/>Brand Z: Ranked #2"]
E3 --> R3["Brand X: Ranked #2<br/>Brand Y: Ranked #3<br/>Brand Z: Ranked #1"]
R1 --> AN[Cross-Engine Analysis]
R2 --> AN
R3 --> AN
AN --> INS["Brand Z: Mentioned by all 3 → Strong signal<br/>Brand X: 2/3 engines → Moderate signal<br/>Brand Y: 2/3 engines, high variance → Possible source bias"]
This diagram illustrates a critical point: analyzing any single engine would have led to an entirely different conclusion. Looking at Engine A alone, you would conclude “Brand X has the highest visibility.” But aggregating all three engines reveals “Brand Z has the most stable visibility” — a fundamentally different takeaway.
Design Principle: AI search visibility should be measured not by ranking on any single engine, but by the consistency and quality of brand mentions across multiple engines.
Design Principles
Here are the core principles applied when designing the multi-engine collection system. These principles are not specific to WICHI — they apply broadly to any multi-LLM system.
Principle 1: Response Variance Is Signal, Not Noise
Initially, we viewed cross-engine variance as “a problem to be unified.” In practice, that variance turned out to be the most valuable insight.
When all engines consistently recommend a particular brand, it means that brand has strong online presence across diverse source types. Conversely, a brand recommended by only one engine likely has content concentrated in the source types that engine favors.
| Pattern | Meaning | Diagnosis |
|---|---|---|
| Consistently high mentions across all engines | Strong presence across diverse sources | Brand visibility is healthy |
| Consistently low mentions across all engines | Overall lack of online presence | Full content strategy review needed |
| High mentions on specific engine(s) only | Concentrated in certain source types | Reinforce content on source types the underperforming engines prefer |
| Large rank variance across engines | Brand positioning perceived differently by source type | Develop separate content strategies per source type |
Rather than dismissing variance with “each engine is different,” displaying engine-by-engine scores side by side is the core structure of a GEO report.
Principle 2: Consensus and Divergence
The most useful framework for interpreting multi-engine data is “consensus vs. divergence.”
graph LR
subgraph Consensus Pattern
CA[Engine A: Brand X #1] --> CS[Strong Signal]
CB[Engine B: Brand X #1] --> CS
CC[Engine C: Brand X #1-2] --> CS
end
subgraph Divergence Pattern
DA[Engine A: Brand Y #1] --> DS[Weak Signal + Needs Diagnosis]
DB[Engine B: Brand Y Not Mentioned] --> DS
DC[Engine C: Brand Y #4] --> DS
end
Consensus: Multiple engines mention the same brand at similar positions. This is a strong signal that the brand exists consistently across diverse information sources. Higher consensus means higher confidence in the brand’s GEO score.
Divergence: Engines produce significantly different results. This is itself a diagnostic target. Divergence triggers the question “Why does this engine omit this brand?” — and the answer to that question points directly to content strategy gaps.
Design Principle: Consensus increases score confidence. Divergence reveals improvement opportunities. Both are meaningful data.
Principle 3: Partial Results Are Valid
If one of three engines fails, the results from the remaining two are still valid. Delivering partial results with explicit notation is better than withholding all data while waiting for perfection. The key requirement: always disclose which engine’s data is missing. This principle runs through the entire error-handling strategy.
Principle 4: Engines Are Plugins
The AI search market is evolving rapidly. New engines emerge, market share shifts, and API specifications change. Each engine should be designed as an independent module implementing a common interface, so that adding or removing engines does not affect the rest of the pipeline.
Parallel Collection Architecture
After committing to multi-engine collection, the next design choice was the collection method.
Sequential vs. Parallel Collection
| Factor | Sequential | Parallel |
|---|---|---|
| Implementation complexity | Low | Higher (async control required) |
| Total elapsed time | Engine A + B + C summed | max(A, B, C) |
| Error handling | Straightforward (sequential try-catch) | Independent per-engine error handling needed |
| Debugging | Easy (trace in order) | Concurrent log separation required |
| Rate limit management | Natural spacing | Burst control needed |
| User-perceived speed | Slow (tens of seconds to minutes) | Fast (limited by slowest engine) |
| Scalability | Linear increase per engine added | Near-constant wait time regardless of engine count |
AI search engines respond slowly compared to typical REST APIs — model inference alone takes seconds to tens of seconds. Sequential processing of three engines makes a single query analysis prohibitively long; repeating this for dozens of queries pushes the total pipeline into minutes.
Parallel collection makes the slowest engine’s response time the total elapsed time. Whether there are three engines or five, latency converges to the single slowest response. Since WICHI is a SaaS where users press an analysis button and wait for results, perceived speed is a direct usability metric.
We chose parallel.
Async Parallel Collection Flow
sequenceDiagram
participant U as User
participant API as API Server
participant EA as Engine A Adapter
participant EB as Engine B Adapter
participant EC as Engine C Adapter
participant DB as Database
U->>API: Analysis request (query list)
API->>API: Prepare query list
par Parallel Collection
API->>EA: Send all queries (async)
API->>EB: Send all queries (async)
API->>EC: Send all queries (async)
end
Note over EA,EC: Each engine also runs<br/>queries concurrently<br/>(with concurrency limits)
EA-->>API: Engine A results (or partial failure)
EB-->>API: Engine B results (or partial failure)
EC-->>API: Engine C results (or partial failure)
API->>API: Aggregate + normalize results
API->>DB: Store normalized responses
API->>U: Progress status update
The key design points are as follows.
Inter-engine parallelism: All three engines receive requests simultaneously. Each engine’s collection proceeds independently. If one engine runs slowly, it does not affect collection from the others.
Intra-engine concurrency control: Within each engine, multiple queries are processed concurrently. However, unlimited concurrent requests would trigger rate limits, so a per-engine semaphore caps concurrent request counts. This cap is configurable per engine.
Inter-request delay: Beyond concurrency limits, short delays between requests prevent burst patterns from hitting the engine’s instantaneous rate limits.
The Adapter Pattern
Each engine is isolated behind an adapter implementing a common interface. The common interface defines:
- Input: Query text, system prompt
- Output: Raw response, list of mentioned brands, list of citations
- Errors: Standardized error types for timeout, rate limit, authentication failure, etc.
In practice, each adapter constructs requests matching that engine’s API specification and converts responses into the common format. Adding a new engine requires only implementing a new adapter — the rest of the pipeline (evaluation, metric calculation, insight generation) remains unchanged.
graph TB
subgraph Pipeline
QE[Query Engine] --> RC[Response Collector]
RC --> JE[Evaluation Engine]
JE --> MC[Metric Calculator]
MC --> IG[Insight Generator]
end
subgraph Adapter Layer
RC --> IF{Common Interface}
IF --> AA[Adapter A]
IF --> AB[Adapter B]
IF --> AC[Adapter C]
IF -.-> AD[Adapter D — Future Extension]
end
AA --> EPA[Engine A API]
AB --> EPB[Engine B API]
AC --> EPC[Engine C API]
AD -.-> EPD[Engine D API]
Design Principle: Engine additions and removals should occur exclusively within the adapter layer, without affecting pipeline logic.
Response Normalization
Each engine returns raw responses in different structures. Normalization is required for consistent downstream processing.
| Normalization Target | Description | Cross-Engine Variance Example |
|---|---|---|
| Brand mention extraction | Detecting brand names in response text | Official name vs. abbreviation vs. mixed-language variants |
| Citation parsing | Extracting source URLs and domains | Inline markdown links vs. footnote-style numbering vs. API field |
| Mention position | Relative location of brand within response | First paragraph vs. mid-list vs. conclusion |
| Response length | Token count of raw response | Default response lengths vary by engine |
| Format | Markdown structure | Heading usage, list style, dividers |
The core principle of normalization is “preserve the raw response while extracting metadata into a unified schema.” Raw text is stored as-is, but parsed data — brand mentions, citations, positions — is stored in an engine-agnostic schema. This allows the downstream Judge engine and metric calculation logic to operate without per-engine branching.
Response Replication and Reliability
Why Send the Same Query Multiple Times
AI model responses are stochastic. The same query sent to the same engine twice may yield different results. Especially with temperature settings above zero, different brands may be mentioned or their order may change.
Because of this stochastic nature, visibility measurement based on a single response is merely “a snapshot of that moment.” Brand A being recommended first in one response does not guarantee the same outcome in the next.
To address this, the same query is sent to each engine multiple times, and statistical results across multiple responses are used. A brand mentioned in 3 out of 3 runs has different visibility stability than one mentioned in 1 out of 3.
| Replication Count | Advantages | Disadvantages |
|---|---|---|
| 1 | Minimum cost, maximum speed | Vulnerable to stochastic variation, low confidence |
| 3 | Reasonable stability, acceptable cost | 3x request volume |
| 5+ | High statistical confidence | Cost and time increase sharply, diminishing returns |
WICHI chose 3 replications per query. While not statistically perfect, this represents a reasonable balance of cost, time, and stability. Increasing to 5 showed marginal improvement over 3, while cost and time scaled proportionally.
Total Request Volume
Queries x replications x engines = total API calls. For WICHI: approximately 40 queries x 3 replications x 3 engines = roughly 360 API calls per analysis run. Efficiently handling this volume requires parallel collection and concurrency control.
Operational Challenges
Multi-engine parallel collection has clear design intent, but ongoing operational difficulties persist.
1. Response Format Unification
Each engine returns responses in different structures. Parsing logic for detecting brand mentions and extracting context must be maintained separately for each engine.
Specific differences include:
- Citation handling: One engine inserts
[1],[2]style numbered citations inline and lists URLs at the bottom. Another uses inline markdown links. A third returns citation lists in a separate API response field. - List structure: One engine presents recommendations as numbered lists, another uses heading-plus-paragraph format, and a third responds with comparison tables.
- Language handling: Engines differ in their use of Korean brand names, English brand names, or mixed representations.
When an engine changes its response format, the corresponding parser must be updated. Such changes often happen without prior notice, requiring continuous monitoring.
2. Rate Limit Management
Sending many requests in parallel increases the risk of hitting per-engine rate limits.
| Limit Type | Description | Mitigation |
|---|---|---|
| RPM (Requests Per Minute) | Per-minute request cap | Concurrency limits + inter-request delay |
| TPM (Tokens Per Minute) | Per-minute token cap | Response length monitoring |
| Daily limit | Daily total request/token cap | Usage tracking + queue when limits approach |
| Burst limit | Instantaneous spike blocking | Minimum inter-request delay |
Each engine has different rate policies, and some specifics are not publicly documented, requiring empirical discovery of safe thresholds. Since a rate limit on one engine should not halt the entire collection, independent per-engine limit management is essential.
Retry strategy for 429 (Too Many Requests) responses is also critical. Immediate retries fail against still-active limits, so exponential backoff is applied — short wait for the first retry, progressively longer waits, giving the engine time to release restrictions.
3. Partial Failure Handling
When one of three engines returns a timeout or error, the question is what to do. Three options were considered during design.
Option A: Full retry. Re-collect from all three engines if any fails. Ensures data completeness but wastes cost and time on already-successful engines. Also risks infinite retries if the failing engine is persistently down.
Option B: Failed engine retry only. Retry only the failed engine, preserving successful results. Reasonable, but still needs retry limits and a final-failure strategy.
Option C: Accept partial results. Generate the report from successful engines, explicitly noting which engine’s data is missing.
WICHI combines B and C: retry the failed engine a limited number of times; if it still fails, generate the report from remaining engines with explicit missing-engine notation.
flowchart TD
START[Start Parallel Collection] --> PA[Engine A Collection]
START --> PB[Engine B Collection]
START --> PC[Engine C Collection]
PA --> |Success| SA[Store Result A]
PB --> |Failure| RB{Retry limit exceeded?}
PC --> |Success| SC[Store Result C]
RB --> |No| RBR[Backoff then retry]
RBR --> |Success| SB[Store Result B]
RBR --> |Failure| RB
RB --> |Yes| SKIP[Skip Engine B — Record as missing]
SA --> MERGE[Aggregate Results]
SB --> MERGE
SC --> MERGE
SKIP --> MERGE
MERGE --> REPORT[Generate Report<br/>Note missing engines]
4. Latency Management
Even with parallel collection, total elapsed time is determined by the “slowest engine.” When response speed varies significantly across engines, fast engines sit idle while waiting for the slow one.
Strategies for managing this:
Timeout settings: Each engine gets a maximum wait time. Exceeding it treats that engine’s collection as failed and proceeds with partial results. Too-short timeouts miss legitimate but slow responses; too-long timeouts tie the entire pipeline to one slow engine.
Progress feedback: Rather than waiting for everything to finish, per-engine collection status is communicated to the user in real time. Updates like “Engine A complete, Engine B in progress, Engine C in progress” reduce perceived wait time.
5. Extensibility for Adding/Removing Engines
The AI search market is shifting rapidly. When new engines appear or existing engines’ market share changes, the collection targets need adjustment.
Each engine addition follows a cycle:
- API integration: Understand the engine’s API spec, set up authentication, implement request/response formats
- Adapter development: Write an adapter implementing the common interface
- Parser development: Build a parser for extracting brand mentions, citations, etc. from that engine’s responses
- Normalization verification: Confirm parsed output follows the same schema as existing engines
- Rate limit exploration: Discover the engine’s rate limits and adjust concurrency settings
- Integration testing: Verify the full pipeline works correctly when the new engine runs alongside existing ones
With the adapter pattern, steps 1-3 are contained within the adapter layer, and pipeline code remains untouched. Registering a new engine in the configuration automatically includes it in parallel collection.
Design Principle: The number of files that need modification when adding an engine should be minimized. Ideally: one adapter file + one config file.
Finding the Right Number of Engines
How Many Is Enough
Intuitively, “more is better” — but engine count has a diminishing returns threshold.
| Engine Count | Advantages | Disadvantages |
|---|---|---|
| 1 | Simple implementation, minimal cost | Biased results, incomplete visibility measurement |
| 2 | Minimal cross-check possible | When two engines disagree, no tiebreaker |
| 3 | Consensus/divergence judgment possible (2 vs. 1) | Moderate operational complexity |
| 4-5 | Finer pattern detection | Proportional cost and maintenance increase, diminishing new insights |
| 6+ | Statistical robustness | Sharply rising cost and complexity, declining ROI |
Three is the minimum unit for judging “consensus” and “divergence.” With only two engines, there is no way to determine which is more representative when results differ. With three, a “2 vs. 1” majority structure emerges. While majority rule is not always correct, it provides the minimum basis for identifying divergence patterns.
WICHI currently uses 3 engines. A fourth could be added as the AI search market evolves, but at this stage, deepening analysis across the existing three delivers more value than adding another engine.
Engine Selection Criteria
The criteria for choosing which engines to include:
| Criterion | Description |
|---|---|
| Market share | Engines with more actual users take priority |
| Source diversity | Engines referencing different source types than existing ones add more cross-check value |
| API stability | Engines with stable APIs and infrequent breaking changes |
| Response quality | Engines that include meaningful brand recommendations and citations |
| Cost efficiency | Engines whose API costs are reasonable relative to analytical value |
The ideal engine combination is one where each engine has strengths in different source types, maximizing the value of cross-checking. Two engines that reference similar sources provide less diagnostic value than two engines with complementary source-type strengths.
Generalizable Patterns
The multi-engine architecture yields patterns applicable to multi-LLM systems in general.
Pattern 1: Fan-Out / Fan-In
Send the same input to multiple LLMs simultaneously (Fan-Out) and integrate all responses after collection (Fan-In). Beyond GEO monitoring, this applies to:
- Quality verification: Send the same question to multiple models and check answer consistency
- Diversity: Collect responses from multiple models to the same prompt and select the best
- Hallucination detection: If multiple models agree, the answer is more likely factual; disagreement flags verification needs
The key is the Fan-In stage: “how to integrate.” Options include simple majority vote, weighted average, or a separate Judge model evaluating responses.
Pattern 2: Graceful Degradation
A design where partial engine failures do not halt the entire system. Partial results are accepted, with missing portions transparently indicated.
Core principles:
- One engine’s failure must not affect other engines’ collection (isolation)
- Confidence of partial results must be stated (transparency)
- Retry counts must be bounded (prevent infinite loops)
- Total failure (all engines fail) requires separate handling
Pattern 3: Adapter-Based Extension
The adapter pattern hides each LLM behind a common interface. Because the LLM market changes rapidly, tight coupling to any specific model or provider means model replacement impacts the entire system.
With the adapter pattern:
- Model replacement requires only adapter changes
- New model addition requires no existing code changes
- A/B testing (running two models side by side) becomes natural
- Transitioning from single-model to multi-model can happen incrementally
Pattern 4: Async Pipeline
LLM API calls have long, unpredictable response times. Synchronous processing ties the entire pipeline to the slowest call. An asynchronous pipeline addresses this structurally.
Why async design is especially important in multi-LLM systems:
- Response time variance across LLMs is large (depends on model, server state, input length)
- Rate-limit-induced waits are frequent
- Retries must not block other requests
- Users need mid-process progress updates
Pattern 5: Response Normalization Layer
To process responses from different LLMs uniformly, a layer that separates raw text from metadata and normalizes metadata into a unified schema is essential. Without this layer, every downstream logic branch must check “which engine produced this response,” and each new engine adds more branches — maintenance complexity grows exponentially.
| Layer | Input | Output | Role |
|---|---|---|---|
| Adapter | Engine-specific API response | Common response object | API spec abstraction |
| Normalization | Common response object | Normalized metadata | Metadata schema unification |
| Analysis | Normalized metadata | Metrics, insights | Engine-agnostic logic |
Current Limitations and Future Work
The current architecture has unsolved problems.
Temporal response drift: The same query to the same engine may produce different results at different times, due to model updates, training data changes, and index refreshes. Currently, WICHI provides single-point-in-time snapshots. Time-series tracking is a future priority.
Engine weighting: Currently, all three engines’ results are treated equally. In reality, engines have different market shares, and weighting by share would produce more realistic visibility measurements. However, AI search market share data itself remains uncertain, so weighting is on hold.
Regional variation: Even the same engine may produce different results for Korean-language vs. English-language queries, and results may vary based on user location settings. The current system is specialized for Korean queries; multilingual support requires separate design work.
Summary
Multi-engine architecture is not simply about “collecting more data.” It is a design that leverages cross-engine response variance as the core analytical signal. This required design choices including async parallel collection, adapter-based extension, response normalization, and partial failure tolerance — each carrying implementation complexity and operational cost.
The reason for maintaining this structure despite those costs is straightforward: accurately measuring brand visibility in AI search requires multi-engine analysis as a non-negotiable prerequisite. A single engine’s results cannot reveal the full picture, and without variance, diagnosis is impossible.
Related Posts

Six GEO Business Opportunities and WICHI's Choice
Strategic analysis of three opportunity factors in the AI search (GEO) market and why WICHI chose 'SaaS-based monitoring' over advertising or agency models.

Jocoding Hackathon Build Log — Building a GEO SaaS in 3 Days
Recording the 3-day MVP build for WICHI during the Jocoding Hackathon, covering tech stack choices (FastAPI, React, Supabase) and priorities for creating a 'working product' under tight deadlines.

Prototype to Production — The Complete Change List
10 key changes for transitioning an MVP to commercial SaaS: security hardening, JWT auth, KO/EN i18n, and payment automation to build a 'payment-ready' service.