Uploaded image for project: 'Documentation'
  1. Documentation
  2. DOCS-12019

Docs for SERVER-35904: Implement logic to detect if merging half of pipeline conserves but possibly renames the shard key

      Description

      Description:

      It's probably worth documenting the cases in which we will be able to use an exchange-based execution strategy. This ticket implements logic to detect when that is possible.

      I believe the description here is mostly accurate - but keep in mind one caveat - SERVER-36787.

      Engineering Ticket Description:

      For pipelines like the following: 

      db.teams.aggregate([
        {$unwind: "$users"},
        {$group: {_id: "$users.username"}, teams: {$push: "$_id"}},
        {$project: {username: "$_id", teams: 1, _id: 0}},
        {$out: {to: "users", mode: "replace", uniqueKey: {username: 1}}},
      ])
      

      The ideal execution plan would be to perform the $unwind and a partial group on each shard, then scatter the partial groups by _id to merge in parallel on each shard. Supposing "users" was sharded with the shard key pattern {username: 1}, we should distribute each partial group to the shard that would own that range of usernames.

      For example, suppose we had two shards: A and B. The "teams" collection is sharded and has some chunks on each shard. The "users" collection is sharded and has the chunk [MinKey, "example") on shard A and ["example", MaxKey] on shard B. Then we should execute this plan by having each shard perform a partial group on the teams collection, then send the partial groups with _id in the range [MinKey, "example") to shard A, and the partial groups with _id in the range ["example", MaxKey] to shard B. That way, in the absence of any chunk migrations, the final $out stage will perform a local write.

      In order to detect that partitioning by the shard key in this way is a good idea, we'll need to do some analysis of the merging half of the pipeline. It'll look something like this:

      1. Check if the last stage is a $out to a sharded collection
      2. If so, consult the chunk manager to find the shard key and the routing table.
      3. Work backwards through the merging half of the pipeline, tracking if any of the shard key fields are modified. If they are renamed, remember their new name.
      4. If we get to the beginning of the merging pipeline without anyone modifying the fields, output the routing table and which fields should be used to partition the data.

      Once we do this, we can take advantage of this machinery and the work in SERVER-35899 to set up an $exchange across all the shards.

      Scope of changes

      Impact to Other Docs

      MVP (Work and Date)

      Resources (Scope or Design Docs, Invision, etc.)

            Assignee:
            Unassigned Unassigned
            Reporter:
            kay.kim@mongodb.com Kay Kim (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              1 year, 24 weeks, 4 days ago