Pushdown $match on keyHash into $queryStats stage to avoid full store scan

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Integration
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      The $queryStats aggregation stage currently performs a full linear scan of the in-memory query stats store on every invocation, regardless of any downstream $match predicates. Pipelines of the form

      [{$queryStats: {}}, {$match: {keyHash: {$in: [...]}}}]

      re-serialize the representative shape for every entry in the store and then discard most of them via the filter — wasted CPU proportional to store size, not result size.

      Background

      Production clusters with many distinct query shapes have been observed running this pipeline at average execution times around 30 seconds when filtering to ~100 specific keyHash values. The cost scales with total store size rather than with the size of the filter, which is the opposite of what callers expect for what is effectively an indexed-style lookup.

      Acceptance Criteria

      • Pipelines of the form $queryStats -> {{$match: {keyHash: $eq | $in}

        }} return the same results as the unoptimized path.

      • Execution time scales with the number of target keyHash values (K), not the size of the store (N).
      • The optimization can be disabled at runtime via a server parameter, for diagnostics and to confirm the pre-optimization behavior.
      • A benchmark exists that covers both the optimized and unoptimized paths and is checked in.

            Assignee:
            Unassigned
            Reporter:
            Arun Banala
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: