[SERVER-79580] Remove HostTypeRequirement::kPrimaryShard from $lookup Created: 01/Aug/23  Updated: 12/Dec/23  Resolved: 12/Dec/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Task Priority: Major - P3
Reporter: David Storch Assignee: Mihai Andrei
Resolution: Fixed Votes: 0
Labels: auto-reverted, pm3229-m1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-81335 Query operations that avoid going thr... Open
Problem/Incident
Related
related to SERVER-79581 Remove HostTypeRequirement::kPrimaryS... Closed
Assigned Teams:
Query Execution
Backwards Compatibility: Fully Compatible
Sprint: QE 2023-10-16, QE 2023-10-30, QE 2023-11-13, QE 2023-11-27, QE 2023-12-11, QE 2023-12-25
Participants:
Linked BF Score: 157

 Description   
  • Just remove the kPrimaryShard requirement and use kNone, since $lookup knows how to find the data wherever it may be placed. However, when all the data is on a single shard, we should move the $lookup to the merging part of the pipeline and have a way to target this specific shard.
  • Extend distributedPlanLogic() so that it can name the id of the shard that should be targeted for the merging pipeline. Alternatively we could introduce a host type requirement like kOwningShardPreferred as suggested by Bernard and require the caller to figure out which shard this corresponds to. My vote is for the former alternative.
  • Plumb the ShardId up from the pipeline splitting code so that it can make its way to dispatchMergingPipeline()

There are quite a few scenarios we need to test in order to validate that $lookup works as expected. We may want to split the testing work into a few separate commits.

Tests:

  • Inner collection is unsharded and not on the primary shard. Outer collection is sharded.
  • Outer collection is unsharded and not on the primary shard. Inner collection is sharded.
  • Neither collection is sharded, but both are collocated on the same shard. Test that we can do SBE $lookup pushdown in this case.
  • Neither collection is sharded and they are located on different shards.
    A [$lookup(A), $lookup(B)] pipeline where both A and B are unsharded but reside on different shards.
  • A [$lookup(A), $lookup(B)] pipeline where A is sharded. B is unsharded but does not reside on the primary shard.
  • A [$lookup(A), $lookup(B)] pipeline where A is unsharded but does not reside on the primary shard. B is sharded.
  • Nested case where we have [$lookup(A, [$lookup(B)])]. A and B are unsharded and on different shards.
  • Nested case where we have [$lookup(A, [$lookup(B)])]. A sharded, B unsharded and not on the primary shard.
  • Nested case where we have [$lookup(A, [$lookup(B)])]. A unsharded and not on the primary shard, B sharded.
  • Should we also test three levels of nesting or is that overkill?
  • moveCollection() on the outer (unsharded) collection during execution. The moveCollection() can only commit during a yield and when the query restores from yield, it should be fail with a QueryPlanKilled error.
  • moveCollection() on the inner (unsharded) collection during execution. As we execute, new sub-queries we should start targeting the inner collection’s new owner. The subquery on the inner side may also fail with QueryPlanKilled if it detects a moveCollection() happened during yield recovery.


 Comments   
Comment by Githook User [ 12/Dec/23 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com', 'username': 'mtandrei'}

Message: SERVER-79580 Add more $lookup targeting tests and support StageConstraints::mergeShardId in $facet

GitOrigin-RevId: 622d5428b61aaa7fe33ea9ae53ab1432b7a582c8
Branch: master
https://github.com/mongodb/mongo/commit/1b9fb9111c43d58a6884ef6f853ef9ff63231c63

Comment by Githook User [ 28/Nov/23 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com', 'username': 'mtandrei'}

Message: SERVER-79580 Remove HostTypeRequirement::kPrimaryShard from $lookup and add basic test coverage
Branch: master
https://github.com/mongodb/mongo/commit/2a960bb98fc1f05fbb27662af4452845b79294fa

Comment by Githook User [ 28/Nov/23 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com', 'username': ''}

Message: Revert "SERVER-79580 Remove HostTypeRequirement::kPrimaryShard from $lookup and add basic test coverage"

This reverts commit 3d67df1b60f447fddf5ebbf6ac219d2fbcf6029c.
Branch: master
https://github.com/mongodb/mongo/commit/b22d630a1209035ca1025884174cdb83846a4983

Comment by Mihai Andrei [ 27/Nov/23 ]

Moving it back to open as there's one more planned PR for this ticket

Comment by Githook User [ 27/Nov/23 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com', 'username': 'mtandrei'}

Message: SERVER-79580 Remove HostTypeRequirement::kPrimaryShard from $lookup and add basic test coverage
Branch: master
https://github.com/mongodb/mongo/commit/3d67df1b60f447fddf5ebbf6ac219d2fbcf6029c

Comment by Githook User [ 16/Nov/23 ]

Author:

{'name': 'Mihai Andrei', 'email': 'mihai.andrei@mongodb.com', 'username': 'mtandrei'}

Message: SERVER-79580 Delete remaining references to 'featureFlagUnsplittableCollectionsOnNonPrimaryShard'
Branch: master
https://github.com/mongodb/mongo/commit/d7f7dba29a6dd86aaf2fc9db773665820481c11d

Generated at Thu Feb 08 06:41:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.