Multi-Engine Architecture — Parallel Collection from 3 AI Search Engines

Analysis of multi-engine architecture design principles that leverage response variance as signals, featuring parallel collection structures and scalability via the adapter pattern.

Why Multiple Engines

When designing a GEO (Generative Engine Optimization) monitoring system, the first decision is which AI search engines to analyze.

A single-engine approach is tempting. You only need one parser, maintenance burden for response format changes stays small, and API costs remain low. WICHI’s initial prototype targeted just one engine.

But when you send the same query to different AI search engines, the results diverge significantly. This is not a matter of surface-level phrasing; the differences are structural.

Why Engine Responses Differ

Each AI search engine uses different training data, different search indexes, different ranking algorithms, and different response generation strategies. These differences directly affect which brands get mentioned, in what order they appear, and which sources are cited.

Differentiating Factor	Impact on Results
Training data composition and cutoff	Whether a brand’s latest information is reflected
Search index scope	Types and range of web sources referenced
Source preference	Weighting toward encyclopedic, community, official sources
Response generation strategy	List-based, comparative, narrative formats
Citation method	Inline citations, footnote lists, direct URL exposure

For example, sending “best project management tools in 2026” to three AI search engines might yield: Engine A recommends Brand X first, citing the official site and review outlets. Engine B omits Brand X entirely and centers on Brand Y with community feedback citations. Engine C mentions both X and Y but in different order and context.

The Structural Limitation of Single-Engine Analysis

Reporting single-engine results as “brand visibility in AI search” is misleading — it represents visibility on that particular engine, not across the AI search ecosystem as a whole. Since you cannot control which engine end users choose, and market share in AI search is shifting rapidly, single-engine analysis is fundamentally incomplete.

graph TD
    Q[Same Search Query] --> E1[Engine A]
    Q --> E2[Engine B]
    Q --> E3[Engine C]

    E1 --> R1["Brand X: Ranked #1<br/>Brand Y: Not mentioned<br/>Brand Z: Ranked #3"]
    E2 --> R2["Brand X: Not mentioned<br/>Brand Y: Ranked #1<br/>Brand Z: Ranked #2"]
    E3 --> R3["Brand X: Ranked #2<br/>Brand Y: Ranked #3<br/>Brand Z: Ranked #1"]

    R1 --> AN[Cross-Engine Analysis]
    R2 --> AN
    R3 --> AN
    AN --> INS["Brand Z: Mentioned by all 3 → Strong signal<br/>Brand X: 2/3 engines → Moderate signal<br/>Brand Y: 2/3 engines, high variance → Possible source bias"]

This diagram illustrates a critical point: analyzing any single engine would have led to an entirely different conclusion. Looking at Engine A alone, you would conclude “Brand X has the highest visibility.” But aggregating all three engines reveals “Brand Z has the most stable visibility” — a fundamentally different takeaway.

Design Principle: AI search visibility should be measured not by ranking on any single engine, but by the consistency and quality of brand mentions across multiple engines.

Design Principles

Here are the core principles applied when designing the multi-engine collection system. These principles are not specific to WICHI — they apply broadly to any multi-LLM system.

Principle 1: Response Variance Is Signal, Not Noise

Initially, we viewed cross-engine variance as “a problem to be unified.” In practice, that variance turned out to be the most valuable insight.

When all engines consistently recommend a particular brand, it means that brand has strong online presence across diverse source types. Conversely, a brand recommended by only one engine likely has content concentrated in the source types that engine favors.

Pattern	Meaning	Diagnosis
Consistently high mentions across all engines	Strong presence across diverse sources	Brand visibility is healthy
Consistently low mentions across all engines	Overall lack of online presence	Full content strategy review needed
High mentions on specific engine(s) only	Concentrated in certain source types	Reinforce content on source types the underperforming engines prefer
Large rank variance across engines	Brand positioning perceived differently by source type	Develop separate content strategies per source type

Rather than dismissing variance with “each engine is different,” displaying engine-by-engine scores side by side is the core structure of a GEO report.

Principle 2: Consensus and Divergence

The most useful framework for interpreting multi-engine data is “consensus vs. divergence.”

graph LR
    subgraph Consensus Pattern
        CA[Engine A: Brand X #1] --> CS[Strong Signal]
        CB[Engine B: Brand X #1] --> CS
        CC[Engine C: Brand X #1-2] --> CS
    end

    subgraph Divergence Pattern
        DA[Engine A: Brand Y #1] --> DS[Weak Signal + Needs Diagnosis]
        DB[Engine B: Brand Y Not Mentioned] --> DS
        DC[Engine C: Brand Y #4] --> DS
    end

Consensus: Multiple engines mention the same brand at similar positions. This is a strong signal that the brand exists consistently across diverse information sources. Higher consensus means higher confidence in the brand’s GEO score.

Divergence: Engines produce significantly different results. This is itself a diagnostic target. Divergence triggers the question “Why does this engine omit this brand?” — and the answer to that question points directly to content strategy gaps.

Design Principle: Consensus increases score confidence. Divergence reveals improvement opportunities. Both are meaningful data.

Principle 3: Partial Results Are Valid

If one of three engines fails, the results from the remaining two are still valid. Delivering partial results with explicit notation is better than withholding all data while waiting for perfection. The key requirement: always disclose which engine’s data is missing. This principle runs through the entire error-handling strategy.

Principle 4: Engines Are Plugins

The AI search market is evolving rapidly. New engines emerge, market share shifts, and API specifications change. Each engine should be designed as an independent module implementing a common interface, so that adding or removing engines does not affect the rest of the pipeline.

Parallel Collection Architecture

After committing to multi-engine collection, the next design choice was the collection method.

Sequential vs. Parallel Collection

Factor	Sequential	Parallel
Implementation complexity	Low	Higher (async control required)
Total elapsed time	Engine A + B + C summed	max(A, B, C)
Error handling	Straightforward (sequential try-catch)	Independent per-engine error handling needed
Debugging	Easy (trace in order)	Concurrent log separation required
Rate limit management	Natural spacing	Burst control needed
User-perceived speed	Slow (tens of seconds to minutes)	Fast (limited by slowest engine)
Scalability	Linear increase per engine added	Near-constant wait time regardless of engine count

AI search engines respond slowly compared to typical REST APIs — model inference alone takes seconds to tens of seconds. Sequential processing of three engines makes a single query analysis prohibitively long; repeating this for dozens of queries pushes the total pipeline into minutes.

Parallel collection makes the slowest engine’s response time the total elapsed time. Whether there are three engines or five, latency converges to the single slowest response. Since WICHI is a SaaS where users press an analysis button and wait for results, perceived speed is a direct usability metric.

We chose parallel.

Async Parallel Collection Flow

sequenceDiagram
    participant U as User
    participant API as API Server
    participant EA as Engine A Adapter
    participant EB as Engine B Adapter
    participant EC as Engine C Adapter
    participant DB as Database

    U->>API: Analysis request (query list)
    API->>API: Prepare query list

    par Parallel Collection
        API->>EA: Send all queries (async)
        API->>EB: Send all queries (async)
        API->>EC: Send all queries (async)
    end

    Note over EA,EC: Each engine also runs<br/>queries concurrently<br/>(with concurrency limits)

    EA-->>API: Engine A results (or partial failure)
    EB-->>API: Engine B results (or partial failure)
    EC-->>API: Engine C results (or partial failure)

    API->>API: Aggregate + normalize results
    API->>DB: Store normalized responses
    API->>U: Progress status update

The key design points are as follows.

Inter-engine parallelism: All three engines receive requests simultaneously. Each engine’s collection proceeds independently. If one engine runs slowly, it does not affect collection from the others.

Intra-engine concurrency control: Within each engine, multiple queries are processed concurrently. However, unlimited concurrent requests would trigger rate limits, so a per-engine semaphore caps concurrent request counts. This cap is configurable per engine.

Inter-request delay: Beyond concurrency limits, short delays between requests prevent burst patterns from hitting the engine’s instantaneous rate limits.

The Adapter Pattern

Each engine is isolated behind an adapter implementing a common interface. The common interface defines:

Input: Query text, system prompt
Output: Raw response, list of mentioned brands, list of citations
Errors: Standardized error types for timeout, rate limit, authentication failure, etc.

In practice, each adapter constructs requests matching that engine’s API specification and converts responses into the common format. Adding a new engine requires only implementing a new adapter — the rest of the pipeline (evaluation, metric calculation, insight generation) remains unchanged.

graph TB
    subgraph Pipeline
        QE[Query Engine] --> RC[Response Collector]
        RC --> JE[Evaluation Engine]
        JE --> MC[Metric Calculator]
        MC --> IG[Insight Generator]
    end

    subgraph Adapter Layer
        RC --> IF{Common Interface}
        IF --> AA[Adapter A]
        IF --> AB[Adapter B]
        IF --> AC[Adapter C]
        IF -.-> AD[Adapter D — Future Extension]
    end

    AA --> EPA[Engine A API]
    AB --> EPB[Engine B API]
    AC --> EPC[Engine C API]
    AD -.-> EPD[Engine D API]

Design Principle: Engine additions and removals should occur exclusively within the adapter layer, without affecting pipeline logic.

Response Normalization

Each engine returns raw responses in different structures. Normalization is required for consistent downstream processing.

Normalization Target	Description	Cross-Engine Variance Example
Brand mention extraction	Detecting brand names in response text	Official name vs. abbreviation vs. mixed-language variants
Citation parsing	Extracting source URLs and domains	Inline markdown links vs. footnote-style numbering vs. API field
Mention position	Relative location of brand within response	First paragraph vs. mid-list vs. conclusion
Response length	Token count of raw response	Default response lengths vary by engine
Format	Markdown structure	Heading usage, list style, dividers

The core principle of normalization is “preserve the raw response while extracting metadata into a unified schema.” Raw text is stored as-is, but parsed data — brand mentions, citations, positions — is stored in an engine-agnostic schema. This allows the downstream Judge engine and metric calculation logic to operate without per-engine branching.

Response Replication and Reliability

Why Send the Same Query Multiple Times

AI model responses are stochastic. The same query sent to the same engine twice may yield different results. Especially with temperature settings above zero, different brands may be mentioned or their order may change.

Because of this stochastic nature, visibility measurement based on a single response is merely “a snapshot of that moment.” Brand A being recommended first in one response does not guarantee the same outcome in the next.

To address this, the same query is sent to each engine multiple times, and statistical results across multiple responses are used. A brand mentioned in 3 out of 3 runs has different visibility stability than one mentioned in 1 out of 3.

Replication Count	Advantages	Disadvantages
1	Minimum cost, maximum speed	Vulnerable to stochastic variation, low confidence
3	Reasonable stability, acceptable cost	3x request volume
5+	High statistical confidence	Cost and time increase sharply, diminishing returns

WICHI chose 3 replications per query. While not statistically perfect, this represents a reasonable balance of cost, time, and stability. Increasing to 5 showed marginal improvement over 3, while cost and time scaled proportionally.

Total Request Volume

Queries x replications x engines = total API calls. For WICHI: approximately 40 queries x 3 replications x 3 engines = roughly 360 API calls per analysis run. Efficiently handling this volume requires parallel collection and concurrency control.

Operational Challenges

Multi-engine parallel collection has clear design intent, but ongoing operational difficulties persist.

1. Response Format Unification

Each engine returns responses in different structures. Parsing logic for detecting brand mentions and extracting context must be maintained separately for each engine.

Specific differences include:

Citation handling: One engine inserts [1], [2] style numbered citations inline and lists URLs at the bottom. Another uses inline markdown links. A third returns citation lists in a separate API response field.
List structure: One engine presents recommendations as numbered lists, another uses heading-plus-paragraph format, and a third responds with comparison tables.
Language handling: Engines differ in their use of Korean brand names, English brand names, or mixed representations.

When an engine changes its response format, the corresponding parser must be updated. Such changes often happen without prior notice, requiring continuous monitoring.

2. Rate Limit Management

Sending many requests in parallel increases the risk of hitting per-engine rate limits.

Limit Type	Description	Mitigation
RPM (Requests Per Minute)	Per-minute request cap	Concurrency limits + inter-request delay
TPM (Tokens Per Minute)	Per-minute token cap	Response length monitoring
Daily limit	Daily total request/token cap	Usage tracking + queue when limits approach
Burst limit	Instantaneous spike blocking	Minimum inter-request delay

Each engine has different rate policies, and some specifics are not publicly documented, requiring empirical discovery of safe thresholds. Since a rate limit on one engine should not halt the entire collection, independent per-engine limit management is essential.

Retry strategy for 429 (Too Many Requests) responses is also critical. Immediate retries fail against still-active limits, so exponential backoff is applied — short wait for the first retry, progressively longer waits, giving the engine time to release restrictions.

3. Partial Failure Handling

When one of three engines returns a timeout or error, the question is what to do. Three options were considered during design.

Option A: Full retry. Re-collect from all three engines if any fails. Ensures data completeness but wastes cost and time on already-successful engines. Also risks infinite retries if the failing engine is persistently down.

Option B: Failed engine retry only. Retry only the failed engine, preserving successful results. Reasonable, but still needs retry limits and a final-failure strategy.

Option C: Accept partial results. Generate the report from successful engines, explicitly noting which engine’s data is missing.

WICHI combines B and C: retry the failed engine a limited number of times; if it still fails, generate the report from remaining engines with explicit missing-engine notation.

flowchart TD
    START[Start Parallel Collection] --> PA[Engine A Collection]
    START --> PB[Engine B Collection]
    START --> PC[Engine C Collection]

    PA --> |Success| SA[Store Result A]
    PB --> |Failure| RB{Retry limit exceeded?}
    PC --> |Success| SC[Store Result C]

    RB --> |No| RBR[Backoff then retry]
    RBR --> |Success| SB[Store Result B]
    RBR --> |Failure| RB
    RB --> |Yes| SKIP[Skip Engine B — Record as missing]

    SA --> MERGE[Aggregate Results]
    SB --> MERGE
    SC --> MERGE
    SKIP --> MERGE

    MERGE --> REPORT[Generate Report<br/>Note missing engines]

4. Latency Management

Even with parallel collection, total elapsed time is determined by the “slowest engine.” When response speed varies significantly across engines, fast engines sit idle while waiting for the slow one.

Strategies for managing this:

Timeout settings: Each engine gets a maximum wait time. Exceeding it treats that engine’s collection as failed and proceeds with partial results. Too-short timeouts miss legitimate but slow responses; too-long timeouts tie the entire pipeline to one slow engine.

Progress feedback: Rather than waiting for everything to finish, per-engine collection status is communicated to the user in real time. Updates like “Engine A complete, Engine B in progress, Engine C in progress” reduce perceived wait time.

5. Extensibility for Adding/Removing Engines

The AI search market is shifting rapidly. When new engines appear or existing engines’ market share changes, the collection targets need adjustment.

Each engine addition follows a cycle:

API integration: Understand the engine’s API spec, set up authentication, implement request/response formats
Adapter development: Write an adapter implementing the common interface
Parser development: Build a parser for extracting brand mentions, citations, etc. from that engine’s responses
Normalization verification: Confirm parsed output follows the same schema as existing engines
Rate limit exploration: Discover the engine’s rate limits and adjust concurrency settings
Integration testing: Verify the full pipeline works correctly when the new engine runs alongside existing ones

With the adapter pattern, steps 1-3 are contained within the adapter layer, and pipeline code remains untouched. Registering a new engine in the configuration automatically includes it in parallel collection.

Design Principle: The number of files that need modification when adding an engine should be minimized. Ideally: one adapter file + one config file.

Finding the Right Number of Engines

How Many Is Enough

Intuitively, “more is better” — but engine count has a diminishing returns threshold.

Engine Count	Advantages	Disadvantages
1	Simple implementation, minimal cost	Biased results, incomplete visibility measurement
2	Minimal cross-check possible	When two engines disagree, no tiebreaker
3	Consensus/divergence judgment possible (2 vs. 1)	Moderate operational complexity
4-5	Finer pattern detection	Proportional cost and maintenance increase, diminishing new insights
6+	Statistical robustness	Sharply rising cost and complexity, declining ROI

Three is the minimum unit for judging “consensus” and “divergence.” With only two engines, there is no way to determine which is more representative when results differ. With three, a “2 vs. 1” majority structure emerges. While majority rule is not always correct, it provides the minimum basis for identifying divergence patterns.

WICHI currently uses 3 engines. A fourth could be added as the AI search market evolves, but at this stage, deepening analysis across the existing three delivers more value than adding another engine.

Engine Selection Criteria

The criteria for choosing which engines to include:

Criterion	Description
Market share	Engines with more actual users take priority
Source diversity	Engines referencing different source types than existing ones add more cross-check value
API stability	Engines with stable APIs and infrequent breaking changes
Response quality	Engines that include meaningful brand recommendations and citations
Cost efficiency	Engines whose API costs are reasonable relative to analytical value

The ideal engine combination is one where each engine has strengths in different source types, maximizing the value of cross-checking. Two engines that reference similar sources provide less diagnostic value than two engines with complementary source-type strengths.

Generalizable Patterns

The multi-engine architecture yields patterns applicable to multi-LLM systems in general.

Pattern 1: Fan-Out / Fan-In

Send the same input to multiple LLMs simultaneously (Fan-Out) and integrate all responses after collection (Fan-In). Beyond GEO monitoring, this applies to:

Quality verification: Send the same question to multiple models and check answer consistency
Diversity: Collect responses from multiple models to the same prompt and select the best
Hallucination detection: If multiple models agree, the answer is more likely factual; disagreement flags verification needs

The key is the Fan-In stage: “how to integrate.” Options include simple majority vote, weighted average, or a separate Judge model evaluating responses.

Pattern 2: Graceful Degradation

A design where partial engine failures do not halt the entire system. Partial results are accepted, with missing portions transparently indicated.

Core principles:

One engine’s failure must not affect other engines’ collection (isolation)
Confidence of partial results must be stated (transparency)
Retry counts must be bounded (prevent infinite loops)
Total failure (all engines fail) requires separate handling

Pattern 3: Adapter-Based Extension

The adapter pattern hides each LLM behind a common interface. Because the LLM market changes rapidly, tight coupling to any specific model or provider means model replacement impacts the entire system.

With the adapter pattern:

Model replacement requires only adapter changes
New model addition requires no existing code changes
A/B testing (running two models side by side) becomes natural
Transitioning from single-model to multi-model can happen incrementally

Pattern 4: Async Pipeline

LLM API calls have long, unpredictable response times. Synchronous processing ties the entire pipeline to the slowest call. An asynchronous pipeline addresses this structurally.

Why async design is especially important in multi-LLM systems:

Response time variance across LLMs is large (depends on model, server state, input length)
Rate-limit-induced waits are frequent
Retries must not block other requests
Users need mid-process progress updates

Pattern 5: Response Normalization Layer

To process responses from different LLMs uniformly, a layer that separates raw text from metadata and normalizes metadata into a unified schema is essential. Without this layer, every downstream logic branch must check “which engine produced this response,” and each new engine adds more branches — maintenance complexity grows exponentially.

Layer	Input	Output	Role
Adapter	Engine-specific API response	Common response object	API spec abstraction
Normalization	Common response object	Normalized metadata	Metadata schema unification
Analysis	Normalized metadata	Metrics, insights	Engine-agnostic logic

Current Limitations and Future Work

The current architecture has unsolved problems.

Temporal response drift: The same query to the same engine may produce different results at different times, due to model updates, training data changes, and index refreshes. Currently, WICHI provides single-point-in-time snapshots. Time-series tracking is a future priority.

Engine weighting: Currently, all three engines’ results are treated equally. In reality, engines have different market shares, and weighting by share would produce more realistic visibility measurements. However, AI search market share data itself remains uncertain, so weighting is on hold.

Regional variation: Even the same engine may produce different results for Korean-language vs. English-language queries, and results may vary based on user location settings. The current system is specialized for Korean queries; multilingual support requires separate design work.

Summary

Multi-engine architecture is not simply about “collecting more data.” It is a design that leverages cross-engine response variance as the core analytical signal. This required design choices including async parallel collection, adapter-based extension, response normalization, and partial failure tolerance — each carrying implementation complexity and operational cost.

The reason for maintaining this structure despite those costs is straightforward: accurately measuring brand visibility in AI search requires multi-engine analysis as a non-negotiable prerequisite. A single engine’s results cannot reveal the full picture, and without variance, diagnosis is impossible.