Preserve prefix-selectivity ordering in CardinalityEstimator::clampZeroEstimates

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Overview

      CardinalityEstimator::clampZeroEstimates replaces every approximate-source zero CE with the same kMinCE = 1, regardless of how many predicates were absorbed into the index bounds. This violates the prefix-selectivity heuristic that a longer index prefix is at least as selective as a shorter one.

      Background

      SERVER-123070 introduced the clamp on approximate-source zero cardinality estimates so that structurally distinct plans no longer tie at the per-stage minCost. It works for the structurally distinct cases the ticket was scoped to fix (covered IXSCAN over compound index vs FETCH+IXSCAN with residual filter on disjoint fields), where the cost differential survives the uniform clamp.

      It does not preserve the prefix-selectivity heuristic. Consider the predicate {{ {a: 1, b: 1, c: 1} }} with indexes {{ {a: 1} }} and {{ {a: 1, b: 1} }} and no matching documents in the sample:

      • IXSCAN(a_1, a:[1,1]) — pushes only a==1 into the bounds, leaves b==1 AND c==1 as a residual filter. Approximate-source CE = 0, clamped to 1.
      • IXSCAN(a_1_b_1, a:[1,1], b:[1,1]) — pushes a==1 AND b==1 into the bounds, leaves c==1 as a residual filter. Approximate-source CE = 0, clamped to 1.

      After clamping both IXSCANs to CE = 1, the cost ordering is decided by small calibrated terms (the per-field IXSCAN cost differential) and CBR can pick the structurally weaker plan.

      Scope of Work

      The clamp must produce CEs that respect the invariant: an IXSCAN whose bounds absorb a superset of another IXSCAN's bound predicates on the same fields must have a clamped CE no greater than the other's.

      Possible approaches:

      • Scale the floor by a function of the number of equality bounds pushed into the index — for example kMinCE / 2^(numEqualityBounds). Preserves strict monotonicity for nested prefixes.
      • Detect the prefix-subset relationship across enumerated plans during clampZeroEstimates and assign a strictly ordered floor.

      Either way, the post-clamp invariant must be cardinalityEstimate(longer-prefix) <= cardinalityEstimate(shorter-prefix) for compatible candidate plans.

      Acceptance Criteria

      • For predicate {{ {a: 1, b: 1, c: 1} }} against indexes {{ {a: 1} }} and {{ {a: 1, b: 1} }} with no matching documents and the planner clamping all approximate zeros, CBR picks the compound index regardless of index creation order.
      • The commented-out assertion in jstests/noPassthroughWithMongod/query/cbr/cbr_sampling.js (inside runZeroCEScenarios, marked with a TODO referencing this ticket) is enabled and passes.
      • No regressions in the existing zero-CE assertions.

      Technical Notes

      • The clamp lives in CardinalityEstimator::clampZeroEstimates in src/mongo/db/query/compiler/optimizer/cost_based_ranker/cardinality_estimator.cpp.
      • The motivating scenario surfaced during review of SERVER-123070; see PR 52398 review thread on cbr_sampling.js:212.

            Assignee:
            Unassigned
            Reporter:
            Timour Katchaounov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: