Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Query Execution
Sprint:
QE 2023-12-11, QE 2023-12-25
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

(The scenario below is inspired by the following test case, specifically the following assertion that we will never push a nested $lookup down to the shards part of the pipeline).

The targeting behavior of nested $lookup stages was recently changed due to ~~SERVER-79580~~ (specifically, the following block ) Suppose I have a nested $lookup:

db.a.aggregate([{$lookup: {from: 'b', pipeline: [{$lookup: {from: 'c': …}], ..}}]_

where 'a', 'b' and 'c' are all sharded. At present, the top level $lookup gets executed in parallel on the shards due to the following logic in DocumentSourceLookup::distributedPlanLogic(). This is true before and after ~~SERVER-79580~~. However, the targeting of the inner lookup (that is

{$lookup:{from: 'c': …}]

against 'b' is now different:

Prior to ~~SERVER-79580~~, this nested $lookup would be part of the merging half of the pipeline because the following check would return kPrimaryShard (this is because we are not on mongos once the top level $lookup is issued to the shards).

However, after ~~SERVER-79580~~, this nested $lookup is now pushed to the shards part of the pipeline because we always indicate a host type of 'kNone', meaning that we no longer have a reason to split the pipeline and put $lookup on the merging half (previously, the kPrimaryShard host type requirement was enough to do this).

While this change in targeting behavior is not an issue in terms of correctness, it is arguably a regression (or at least, undesirable) from a performance perspective: it is arguably better to stick the $lookup in the merging half of the pipeline and only run a single one as opposed to pushing down to the shards part of the pipeline and executing one $lookup on each shard that owns a chunk for the collection we are reading from.

Now, to the point: This ticket proposes to 'correct' this behavior by changing DocumentSourceLookUp::constraints to designate the current shard as the merging shard for $lookup when targeting an sharded collection such that we only execute one $lookup in this case.

depends on

SERVER-79581 Remove HostTypeRequirement::kPrimaryShard from $graphLookup

Closed

related to

SERVER-84086 Complete TODO listed in SERVER-83860

Closed

Assignee:: Mihai Andrei
Reporter:: Mihai Andrei
Participants:: Mihai Andrei
Votes:: 0 Vote for this issue
Watchers:: 2 Start watching this issue

Created:: Dec 04 2023 07:14:57 PM UTC
Updated:: Dec 12 2023 07:36:12 AM UTC
Resolved:: Dec 11 2023 08:26:22 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates