PathArrayness incorrectly marks parent field as non-array when index uses positional path notation

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Query Optimization
    • ALL
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      PathArrayness incorrectly reports that a field cannot be an array when the only index information about that field comes from a positional path (e.g., e.0). This causes the dependency graph's canPathBeArray() analysis to return false for paths that actually can (and do) contain arrays, leading to failures in the aggregation_dependency_graph_validation_passthrough suite.

      Root Cause

      When a BTree index has a key path like "e.0" and the document has e: ["elem1"], the BtreeKeyGenerator uses positional array access to extract e[0] without generating multiple keys. This means the index is not marked as multikey at the e level (multikeyPaths for that key path is empty).

      PathArrayness::TrieNode::insertPath() then creates a trie node for "e" with canBeArray = false because multikeyPaths.count(0) == 0.

      When the dependency graph later calls canPathBeArray("e"), the trie lookup finds the "e" node and returns false. But the field e is an array — the non-multikey status only means the index doesn't expand the array into multiple keys due to positional access, not that the field itself isn't an array.

      Impact

      The $_internalValidateArrayness stage detects this mismatch at runtime and throws error 12508302:

      Dependency graph arrayness validation failed: field 'e' contains an array but canPathBeArray() returned false.
      

      This causes failures in:

      • jstests/aggregation/sources/project/remove_redundant_projects.js — the $project with $filter reads field e (which is ["elem1"]), with index {a: 1, "c.d": 1, "e.0": 1}
      • jstests/aggregation/sources/lookup/lookup_equijoin_semantics_inlj.js — similar interaction with positional indexed paths

      Reproduction

      buildscripts/resmoke.py run --force-excluded-tests \
        --suites=aggregation_dependency_graph_validation_passthrough \
        jstests/aggregation/sources/project/remove_redundant_projects.js
      

      Unit tests demonstrating the bug:

      bazel run +path_arrayness_test -- --gtest_filter="*Positional*"
      

      Fix

      PathArrayness should not conclude that a path prefix cannot be an array solely because an index with a positional (numeric) path component is not multikey at that depth. When a path like "e.0" has no multikey component at depth 0, the trie should either:

      • Not insert the parent node "e" at all (leaving it unknown → conservative true), or
      • Detect that the next component is numeric (a positional accessor) and mark the parent as conservatively possibly-array

      Unit Tests

      Added in src/mongo/db/query/compiler/metadata/path_arrayness_test.cpp:

      • PathArraynessTest.PositionalIndexPathShouldNotMarkParentAsNonArray
      • PathArraynessTest.CompoundIndexWithPositionalPathDoesNotAffectParentArrayness

            Assignee:
            Unassigned
            Reporter:
            Matt Olma
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: