Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79580

Remove HostTypeRequirement::kPrimaryShard from $lookup

    • Query Execution
    • Fully Compatible
    • QE 2023-10-16, QE 2023-10-30, QE 2023-11-13, QE 2023-11-27, QE 2023-12-11, QE 2023-12-25
    • 157

      • Just remove the kPrimaryShard requirement and use kNone, since $lookup knows how to find the data wherever it may be placed. However, when all the data is on a single shard, we should move the $lookup to the merging part of the pipeline and have a way to target this specific shard.
      • Extend distributedPlanLogic() so that it can name the id of the shard that should be targeted for the merging pipeline. Alternatively we could introduce a host type requirement like kOwningShardPreferred as suggested by Bernard and require the caller to figure out which shard this corresponds to. My vote is for the former alternative.
      • Plumb the ShardId up from the pipeline splitting code so that it can make its way to dispatchMergingPipeline()

      There are quite a few scenarios we need to test in order to validate that $lookup works as expected. We may want to split the testing work into a few separate commits.

      Tests:

      • Inner collection is unsharded and not on the primary shard. Outer collection is sharded.
      • Outer collection is unsharded and not on the primary shard. Inner collection is sharded.
      • Neither collection is sharded, but both are collocated on the same shard. Test that we can do SBE $lookup pushdown in this case.
      • Neither collection is sharded and they are located on different shards.
        A [$lookup(A), $lookup(B)] pipeline where both A and B are unsharded but reside on different shards.
      • A [$lookup(A), $lookup(B)] pipeline where A is sharded. B is unsharded but does not reside on the primary shard.
      • A [$lookup(A), $lookup(B)] pipeline where A is unsharded but does not reside on the primary shard. B is sharded.
      • Nested case where we have [$lookup(A, [$lookup(B)])]. A and B are unsharded and on different shards.
      • Nested case where we have [$lookup(A, [$lookup(B)])]. A sharded, B unsharded and not on the primary shard.
      • Nested case where we have [$lookup(A, [$lookup(B)])]. A unsharded and not on the primary shard, B sharded.
      • Should we also test three levels of nesting or is that overkill?
      • moveCollection() on the outer (unsharded) collection during execution. The moveCollection() can only commit during a yield and when the query restores from yield, it should be fail with a QueryPlanKilled error.
      • moveCollection() on the inner (unsharded) collection during execution. As we execute, new sub-queries we should start targeting the inner collection’s new owner. The subquery on the inner side may also fail with QueryPlanKilled if it detects a moveCollection() happened during yield recovery.

            Assignee:
            mihai.andrei@mongodb.com Mihai Andrei
            Reporter:
            david.storch@mongodb.com David Storch
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: