[SERVER-36881] When a pipeline has a $limit stage, attempt to push a limit to each shard in the shards part Created: 27/Aug/18  Updated: 29/Oct/23  Resolved: 09/Nov/18

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Performance
Affects Version/s: None
Fix Version/s: 4.1.6

Type: Bug Priority: Major - P3
Reporter: Charlie Swanson Assignee: Justin Seyster
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:
Linked BF Score: 0

 Description   

Consider the pipeline

db.example.aggregate([{$match: {x: 4}}, {$skip: 1000}, {$limit: 1}])

In this pipeline, the $skip stage forces a split in the pipeline (since it would be incorrect for each shard to apply the skip). Then we're left with this execution plan:

mongos> db.products.explain().aggregate([{$match: {x: 4}}, {$skip: 1000}, {$limit: 1}])
{
	"mergeType" : "mongos",
	"splitPipeline" : {
		"shardsPart" : [
			{
				"$match" : {
					"x" : {
						"$eq" : 4
					}
				}
			}
		],
		"mergerPart" : [
			{
				"$skip" : NumberLong(1000)
			},
			{
				"$limit" : NumberLong(1)
			}
		]
	},

So as you can see, each shard has no idea that it actually only needs to produce at most 1001 documents to satisfy the query. In this case and in others where the $limit comes after stages which do not change the number of documents in the pipeline - we should duplicate a $limit stage on the shards part of the pipeline.



 Comments   
Comment by Githook User [ 09/Nov/18 ]

Author:

{'name': 'Justin Seyster', 'email': 'justin.seyster@mongodb.com', 'username': 'jseyster'}

Message: SERVER-36881 Push merge-pipeline $limit stage to shards.
Branch: master
https://github.com/mongodb/mongo/commit/cadd454ca9dcfad9883d98a2b2de78d5f129d4aa

Comment by Asya Kamsky [ 25/Oct/18 ]

I agree we should look at it soon and try to fix it.

Comment by Charlie Swanson [ 15/Oct/18 ]

Oh great, LGTM then - I guess the actions are the same which I didn't realize

Comment by David Storch [ 15/Oct/18 ]

charlie.swanson, oh, yeah I meant to keep it as a "SERVER ticket blocking BF" but not pull it into regular sprint work. It seems plausible that we could knock this out on a future Friday.

Comment by Charlie Swanson [ 15/Oct/18 ]

david.storch are you sure? This is an 87% regression on what I imagine isn't a crazy use case. I would like to counter-propose that we should treat this as a "SERVER ticket blocking BF" and keep it around for BF Fridays.

Comment by David Storch [ 15/Oct/18 ]

I think we should avoid scope creep and move this back to the backlog. justin.seyster, can you provide a link to your WIP work and return this to the backlog user?

Comment by Ian Whalen (Inactive) [ 12/Oct/18 ]

Great thanks Justin. david.storch asya do you want to pull this into next sprint?

Comment by Justin Seyster [ 12/Oct/18 ]

After a day of working on this, I feel pretty good about it. I can probably have a review ready after another day of work.

My proposed solution is to add a function in splitPipeline() that scans through the "mergerPart" of the pipeline. For each stage,

  • if the stage is a $skip stage, add the number of skipped documents to the count of documents needed from each shard;
  • if the stage is a $limit stage, add the limit to the count of documents need from each shard, add a new $limit stage to the end of the "shardsPart" pipeline using that count, and exit (having successfully applied the optimization);
  • if the stage is some other stage that may not output the same number of documents as it gets for input (anything other than $addFields, $project, and similar), exit without applying any optimization.

If there is no $limit stage, the function also exits without applying any optimization.

I have a rough implementation of this function that just needs to be cleaned up and tested.

Comment by Ian Whalen (Inactive) [ 11/Oct/18 ]

Currently assigning to Justin to see how much progress he can make in a single day. Otherwise will circle back to pull this into a sprint or not based on feedback from Asya.

Comment by Charlie Swanson [ 27/Aug/18 ]

This was discovered after a recent refactor exposed the fact that we were using a batch size of 101 to communicate between the shards and the merger. As part of this work, we should make sure we are never doing that. We certainly aren't for this pipeline after SERVER-33323, but we may still be using a batch size of 101 for pipelines where the merging half of the pipeline is not entirely limits and skips.

Generated at Thu Feb 08 04:44:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.