-
Type: Task
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Query Execution
-
Fully Compatible
-
QE 2023-10-16, QE 2023-10-30, QE 2023-11-13, QE 2023-11-27, QE 2023-12-11, QE 2023-12-25
-
157
- Just remove the kPrimaryShard requirement and use kNone, since $lookup knows how to find the data wherever it may be placed. However, when all the data is on a single shard, we should move the $lookup to the merging part of the pipeline and have a way to target this specific shard.
- Extend distributedPlanLogic() so that it can name the id of the shard that should be targeted for the merging pipeline. Alternatively we could introduce a host type requirement like kOwningShardPreferred as suggested by Bernard and require the caller to figure out which shard this corresponds to. My vote is for the former alternative.
- Plumb the ShardId up from the pipeline splitting code so that it can make its way to dispatchMergingPipeline()
There are quite a few scenarios we need to test in order to validate that $lookup works as expected. We may want to split the testing work into a few separate commits.
Tests:
- Inner collection is unsharded and not on the primary shard. Outer collection is sharded.
- Outer collection is unsharded and not on the primary shard. Inner collection is sharded.
- Neither collection is sharded, but both are collocated on the same shard. Test that we can do SBE $lookup pushdown in this case.
- Neither collection is sharded and they are located on different shards.
A [$lookup(A), $lookup(B)] pipeline where both A and B are unsharded but reside on different shards. - A [$lookup(A), $lookup(B)] pipeline where A is sharded. B is unsharded but does not reside on the primary shard.
- A [$lookup(A), $lookup(B)] pipeline where A is unsharded but does not reside on the primary shard. B is sharded.
- Nested case where we have [$lookup(A, [$lookup(B)])]. A and B are unsharded and on different shards.
- Nested case where we have [$lookup(A, [$lookup(B)])]. A sharded, B unsharded and not on the primary shard.
- Nested case where we have [$lookup(A, [$lookup(B)])]. A unsharded and not on the primary shard, B sharded.
- Should we also test three levels of nesting or is that overkill?
- moveCollection() on the outer (unsharded) collection during execution. The moveCollection() can only commit during a yield and when the query restores from yield, it should be fail with a QueryPlanKilled error.
- moveCollection() on the inner (unsharded) collection during execution. As we execute, new sub-queries we should start targeting the inner collection’s new owner. The subquery on the inner side may also fail with QueryPlanKilled if it detects a moveCollection() happened during yield recovery.
- is depended on by
-
SERVER-81335 Query operations that avoid going through the network when a shard is targeting only itself should create a fresh operation context
- Open
- related to
-
SERVER-79581 Remove HostTypeRequirement::kPrimaryShard from $graphLookup
- Closed