Dependency graph canPathBeArray() returns false for N-style accumulator outputs ($minN, $maxN, $firstN, $lastN, $topN, $bottomN)

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      The dependency graph's canPathBeArray() analysis incorrectly reports that fields produced by N-style accumulators ($minN, $maxN, $firstN, $lastN, $topN, $bottomN) cannot contain arrays. These accumulators always emit an array of length n, so any field they assign is always an array.

      This was detected by the new aggregation_dependency_graph_validation_passthrough suite (SERVER-125083) running [group_pbt.js, which inserts $_internalAssertDataAssumptions stages between pipeline stages to validate the analysis at runtime.

      Steps to Reproduce

      Run with internalEnableDependencyGraphValidation: true, featureFlagEnableTestingAggregateRewriteRules: true, featureFlagPathArrayness: true:

      db.coll.drop();
      db.coll.insert({_id: 0, a: 1});
      db.coll.aggregate([
      {$group: {_id: null, m: {$minN: {input: "$a", n: 1}}}},
      {$project: {"m.m2": "$a"}},
      {$unwind: {path: "$m"}},
      {$project: {_id: 0, a: 1}}
      ]);
      

      Expected Behavior

      The aggregation runs successfully. The dependency graph reports that m can be an array after [$group.

      Actual Behavior
      Location12508302: Dependency graph arrayness validation failed: field 'm'
      contains an array but canPathBeArray() returned false.
      Document: {_id: null, m: [{}]}.
      This indicates a bug in the dependency graph analysis.

      The intermediate document {_id: null, m: [{}]} produced by $group has m as an array ($minN always returns an array of size n), but the dependency graph's arrayness tracker reports canPathBeArray("m") == false, causing the runtime validator to fire.

      Root cause. The arrayness analysis for accumulator output paths in $group does not account for accumulators whose output is unconditionally an array. Affected accumulators include:

      $minN, $maxN
      $firstN, $lastN
      $topN, $bottomN
      $push, $addToSet (also produce arrays — these may already be handled, but should be verified)

      The analysis should mark the output field of any such accumulator as potentially-array.

      Proposed fix. In the dependency graph's accumulator-output arrayness logic, set the produced path to canBeArray=true for the accumulators listed above. Add coverage in path_arrayness_test.cpp for each accumulator.

            Assignee:
            Matt Olma
            Reporter:
            Matt Olma
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated: