Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-83860

Designate a merging shard for nested $lookup stages which target a sharded collection

    • Type: Icon: Task Task
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Query Execution
    • QE 2023-12-11, QE 2023-12-25

      (The scenario below is inspired by the following test case, specifically the following assertion that we will never push a nested $lookup down to the shards part of the pipeline).

      The targeting behavior of nested $lookup stages was recently changed due to SERVER-79580 (specifically, the following block ) Suppose I have a nested $lookup:

      db.a.aggregate([{$lookup: {from: 'b', pipeline: [{$lookup: {from: 'c': …}], ..}}]_
      

      where 'a', 'b' and 'c' are all sharded. At present, the top level $lookup gets executed in parallel on the shards due to the following logic in DocumentSourceLookup::distributedPlanLogic(). This is true before and after SERVER-79580. However, the targeting of the inner lookup (that is

      {$lookup:{from: 'c': …}]

      against 'b' is now different:

      Prior to SERVER-79580, this nested $lookup would be part of the merging half of the pipeline because the following check would return kPrimaryShard (this is because we are not on mongos once the top level $lookup is issued to the shards).

      However, after SERVER-79580, this nested $lookup is now pushed to the shards part of the pipeline because we always indicate a host type of 'kNone', meaning that we no longer have a reason to split the pipeline and put $lookup on the merging half (previously, the kPrimaryShard host type requirement was enough to do this).

      While this change in targeting behavior is not an issue in terms of correctness, it is arguably a regression (or at least, undesirable) from a performance perspective: it is arguably better to stick the $lookup in the merging half of the pipeline and only run a single one as opposed to pushing down to the shards part of the pipeline and executing one $lookup on each shard that owns a chunk for the collection we are reading from.

      Now, to the point: This ticket proposes to 'correct' this behavior by changing DocumentSourceLookUp::constraints to designate the current shard as the merging shard for $lookup when targeting an sharded collection such that we only execute one $lookup in this case.

            Assignee:
            mihai.andrei@mongodb.com Mihai Andrei
            Reporter:
            mihai.andrei@mongodb.com Mihai Andrei
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: