Query plan regression for $project before $count

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: 8.1.0, 8.2.0, 8.3.0
    • Component/s: None
    • Query Optimization
    • Minor Change
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Ever since the anti-materialization improvements in SERVER-83235, the SBE pipeline optimizer no longer eliminates a $project stage whose computed fields are unused by a subsequent $count aggregation. This regression appeared between 8.0 and 8.3. On 8.0, the optimizer correctly identifies that $count does not consume any of the projected values and removes the entire projection. On 8.3, the projection executes for every document. For the repro case, this generated type-checking bytecode that adds ~87 ms of unnecessary work for a query over 100K documents.

      Repro

      Insert 100K documents with integer fields int0int7, then run:

      db.collection.aggregate([
        { $project: {
            k0: { $add: ["$int0", "$int1"] },
            k1: { $add: ["$int1", "$int2"] },
            k2: { $add: ["$int2", "$int3"] },
            k3: { $add: ["$int3", "$int4"] },
            k4: { $add: ["$int4", "$int5"] },
            k5: { $add: ["$int5", "$int6"] },
            k6: { $add: ["$int6", "$int7"] },
            k7: { $add: ["$int7", "$int0"] }
        }},
        { $count: "total" }
      ])
      

      Compare against bare $count:

      db.collection.aggregate([{ $count: "total" }])
      

      Expected vs Actual

      Version Pipeline Avg latency Expected
      8.0.9 $project + $count 19 ms ✓ Same as $count-only; projection eliminated
      8.0.9 $count only 18 ms
      8.3.0 $project + $count 104 ms ✗ 5.8× regression; projection executes needlessly
      8.3.0 $count only 17 ms

      Evidence from explain("executionStats")

      8.0.9 — projection optimized away:

      • SBE scan: scanFieldNames: [] (full document blob passed, no per-field extraction)
      • No intermediate project stage between scan and group
      • saveState: 1 (1 yield total)
      • Query hash: 7AE2E0A7

      8.3.0 — projection executes unnecessarily:

      • SBE scan: scanFieldNames: ["int0","int1","int2","int3","int4","int5","int6","int7"] (all fields extracted individually)
      • A project stage (planNodeId 2) between scan and group computes all 8 $add expressions
      • Each $add generates ~20 lines of SBE bytecode including typeMatch, isNumber, isDate, and fail() guards — none of whose outputs are consumed by the group/$count
      • saveState: 7 (7 yields vs 1)
      • Query hash changed: C3513B10

      Impact

      This regression was identified as a contributing root cause of the many_long_queries_locust sys-perf failure (PERF-9001). Under the Atlas load test (16 open-loop LongQuery users at 1 rps over 100K docs), the intrinsic 5.8× CPU slowdown compounds to ~264× latency degradation (38 ms → 10,165 ms) because slower queries hold execution control tickets longer, accumulating in-flight operations and starving the ticket pool.

      Versions

      • Regression not present: 8.0.9
      • Regression present: 8.3.0 (and master/9.0-alpha as of PERF-9001 waterfall run b8cffb9d)

            Assignee:
            Steve Tarzia
            Reporter:
            Steve Tarzia
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: