[SERVER-36881] When a pipeline has a $limit stage, attempt to push a limit to each shard in the shards part Created: 27/Aug/18 Updated: 29/Oct/23 Resolved: 09/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Aggregation Framework, Performance |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.6 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Charlie Swanson | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Backwards Compatibility: | Fully Compatible | ||||
| Operating System: | ALL | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
Consider the pipeline
In this pipeline, the $skip stage forces a split in the pipeline (since it would be incorrect for each shard to apply the skip). Then we're left with this execution plan:
So as you can see, each shard has no idea that it actually only needs to produce at most 1001 documents to satisfy the query. In this case and in others where the $limit comes after stages which do not change the number of documents in the pipeline - we should duplicate a $limit stage on the shards part of the pipeline. |
| Comments |
| Comment by Githook User [ 09/Nov/18 ] |
|
Author: {'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}Message: |
| Comment by Asya Kamsky [ 25/Oct/18 ] |
|
I agree we should look at it soon and try to fix it. |
| Comment by Charlie Swanson [ 15/Oct/18 ] |
|
Oh great, LGTM then - I guess the actions are the same which I didn't realize |
| Comment by David Storch [ 15/Oct/18 ] |
|
charlie.swanson, oh, yeah I meant to keep it as a "SERVER ticket blocking BF" but not pull it into regular sprint work. It seems plausible that we could knock this out on a future Friday. |
| Comment by Charlie Swanson [ 15/Oct/18 ] |
|
david.storch are you sure? This is an 87% regression on what I imagine isn't a crazy use case. I would like to counter-propose that we should treat this as a "SERVER ticket blocking BF" and keep it around for BF Fridays. |
| Comment by David Storch [ 15/Oct/18 ] |
|
I think we should avoid scope creep and move this back to the backlog. justin.seyster, can you provide a link to your WIP work and return this to the backlog user? |
| Comment by Ian Whalen (Inactive) [ 12/Oct/18 ] |
|
Great thanks Justin. david.storch asya do you want to pull this into next sprint? |
| Comment by Justin Seyster [ 12/Oct/18 ] |
|
After a day of working on this, I feel pretty good about it. I can probably have a review ready after another day of work. My proposed solution is to add a function in splitPipeline() that scans through the "mergerPart" of the pipeline. For each stage,
If there is no $limit stage, the function also exits without applying any optimization. I have a rough implementation of this function that just needs to be cleaned up and tested. |
| Comment by Ian Whalen (Inactive) [ 11/Oct/18 ] |
|
Currently assigning to Justin to see how much progress he can make in a single day. Otherwise will circle back to pull this into a sprint or not based on feedback from Asya. |
| Comment by Charlie Swanson [ 27/Aug/18 ] |
|
This was discovered after a recent refactor exposed the fact that we were using a batch size of 101 to communicate between the shards and the merger. As part of this work, we should make sure we are never doing that. We certainly aren't for this pipeline after |