[SERVER-40909] push down $skip stage to query when possible Created: 30/Apr/19  Updated: 29/Oct/23  Resolved: 05/Oct/20

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Nikita Lapkov (Inactive)
Resolution: Fixed Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File huge_projection_pipeline.js     File sample_document.json     Text File skip_push_down.patch    
Issue Links:
Problem/Incident
Related
related to SERVER-43297 Inefficient query on view due to $lim... Closed
Tested
Backwards Compatibility: Fully Compatible
Sprint: Query 2020-09-07, Query 2020-09-21, Query 2020-10-05, Query 2020-10-19
Participants:
Case:
Linked BF Score: 0

 Description   

Not sure why we don't do this:

db.skip.explain().aggregate([{$match:{a:{$gte:7}}},{$sort:{a:1}},{$skip:2}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"a" : {
						"$gte" : 7
					}
				},
				"sort" : {
					"a" : 1
				},
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "agg.skip",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"a" : {
							"$gte" : 7
						}
					},
					"winningPlan" : {
						"stage" : "FETCH",
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"a" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[7.0, inf.0]"
								]
							}
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$skip" : NumberLong(2)
		}
	],
	"ok" : 1
}

> db.skip.explain().aggregate([{$match:{a:{$gte:7}}},{$sort:{a:1}},{$skip:2},{$limit:1}])
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"a" : {
						"$gte" : 7
					}
				},
				"sort" : {
					"a" : 1
				},
				"limit" : NumberLong(3),
				"queryPlanner" : {
					"plannerVersion" : 1,
					"namespace" : "agg.skip",
					"indexFilterSet" : false,
					"parsedQuery" : {
						"a" : {
							"$gte" : 7
						}
					},
					"winningPlan" : {
						"stage" : "FETCH",
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"a" : 1
							},
							"indexName" : "a_1",
							"isMultiKey" : false,
							"multiKeyPaths" : {
								"a" : [ ]
							},
							"isUnique" : false,
							"isSparse" : false,
							"isPartial" : false,
							"indexVersion" : 2,
							"direction" : "forward",
							"indexBounds" : {
								"a" : [
									"[7.0, inf.0]"
								]
							}
						}
					},
					"rejectedPlans" : [ ]
				}
			}
		},
		{
			"$skip" : NumberLong(2)
		}
	],
	"ok" : 1
}

In the first case we don't push anything down, in the second case we push down limit(3) - sum of skip and limit values - but in both cases we then do the skip in agg in the next stage.



 Comments   
Comment by Nikita Lapkov (Inactive) [ 06/Oct/20 ]

Benchmarks results arrived. I have picked the closest tested commit after this change was merged. The results are:

  • Aggregation.SortProjectSkipWithBigDocuments: +8.3%
  • Aggregation.SortSkipProjectWithBigDocuments: +8.4%
  • Aggregation.ProjectSkipWithBigDocuments: +260.3%
Comment by Githook User [ 05/Oct/20 ]

Author:

{'name': 'Nikita Lapkov', 'email': 'nikita.lapkov@mongodb.com', 'username': 'laplab'}

Message: SERVER-40909 push down $skip stage to query when possible
Branch: master
https://github.com/mongodb/mongo/commit/32cea84dcc86009cc09d8e30cd7534fac6fb2242

Comment by Nikita Lapkov (Inactive) [ 01/Oct/20 ]

We need to wait to accumulate enough history for new tests on master branch. The plan is to push changes on Monday.

Comment by Githook User [ 01/Oct/20 ]

Author:

{'name': 'Nikita Lapkov', 'email': 'nikita.lapkov@mongodb.com', 'username': 'laplab'}

Message: PERF-1989 Add benchmarks to measure impact of SERVER-40909
Branch: master
https://github.com/mongodb/mongo-perf/commit/c57034f09d022e4ee2882c44ae65ba0ba2b1150f

Comment by Nikita Lapkov (Inactive) [ 22/Sep/20 ]

Patch is ready to merge, Evergreen is okay. Marking this as blocked until PERF-1989 is resolved.

Comment by Ian Boros [ 15/Sep/20 ]

We should consider doing PERF-1989 to deal with the $skips that get added to the microbenchmarks before merging this.

Comment by Nikita Lapkov (Inactive) [ 11/Sep/20 ]

Turns out I was measuring patch with --opt=off. Here are updated results for builds with --opt=on.

  SORT => SKIP => PROJECT PROJECT => SORT => SKIP Regression
1 thread 1.5706201340263124 1.5374107741112062 -2.16%
8 threads 3.97652939101386 3.8824184365407817 -2.42%

Note: 32 threads consistently gave me 0 ops/sec for both builds for some reason, so I have used 8 threads instead.

Comment by Nikita Lapkov (Inactive) [ 10/Sep/20 ]

After talking with bernard.gorman and david.storch we decided to benchmark PROJECT => SORT => SKIP and SORT => SKIP => PROJECT plans on some case where the latter should perform better.

I have used aggregation pipeline $sort => $project => $skip for all tests, where $skip omits all documents from collection except one. The projection and sample document were taken from SERVER-30633. I have removed all expressions from projection because they were preventing pushing of PROJECT behind SORT. Final projection contained 78849 inclusion rules. The sample document contained 14660 leaf nodes. Collection used for testing contained 500 such documents. The number of documents was chosen to allow SORT stage sort everything in memory. See sample_document.json and huge_projection_pipeline.js for details.

The same query was run on 2 builds:

  1. Original patch from charlie.swanson. Does pushdown of $skip to query stage. Does not push PROJECT behind SORT if there is SKIP between them.
  2. The same patch, but always pushing PROJECT behind SORT, even if there is a SKIP between them

First build was producing SORT => SKIP => PROJECT plan for the query, second build returned PROJECT => SORT => SKIP plan.

Each build was configured as described here and started like this:

numactl --physcpubind=12,13,14,15 -i 0 ./build/install/bin/mongod --replSet rs0 --port 50000 --bind_ip localhost

Benchmark query was executed using benchrun.py for 1 thread and for 32 threads:

numactl --physcpubind=1,2,3 -i 0 python2 benchrun.py --shell ../mongo/build/install/bin/mongo -t THREAD_COUNT --trialCount 5 --port 50000 --replset rs0 -f testcases/huge_projection_pipeline.js --includeFilter aggregation

The results are summarized in the table. Each value is ops/second returned by benchrun.py.

  SORT => SKIP => PROJECT PROJECT => SORT => SKIP Regression
1 thread 0.12494183772023579 0.11548743612955885 -8.18%
32 threads 0.015050143297231916 0.013868785258024836 -8.52%

What do you think?

Comment by Nikita Lapkov (Inactive) [ 08/Sep/20 ]

charlie.swanson  PROJECT => SORT => SKIP was always chosen, without any conditions. Thank you for idea with  $_internalInhibitOptimization!

Comment by Charlie Swanson [ 08/Sep/20 ]

nikita.lapkovin your most recent performance test did you always prefer PROJECT => SORT => SKIP, or only when the projection is simple, as implied in your previous comment?

As a workaround for the tests, you could add an $_internalInhibitOptimization stage in addition to the final $skip.

Comment by Nikita Lapkov (Inactive) [ 08/Sep/20 ]

I have ran performance test for patch where PROJECT => SORT => SKIP is always preferred over SORT => SKIP => PROJECT (Evergreen).

No regressions were detected. Some improvements were found (one, two, three). As mentioned above, most likely the cause is that pipeline did not get any input.

Comment by Nikita Lapkov (Inactive) [ 03/Sep/20 ]

I have re-applied the original patch and sent it for performance testing again. Unfortunately, it crashed on one of the workloads. To fix this, I have made a small fix described below.

To perform $skip pushdown to the query stage we build structure called LimitThenSkip. It represents operation "limit X documents and skip Y after it". This structure is extracted in method DocumentSourceSort::extractSkipAndLimitForPushdown, src/mongo/db/pipeline/document_source_sort.cpp.

On certain queries computed skip value was greater than limit. Basically we were asking to skip more documents than the limit stage returned to us. This cause the invariant in function attemptToGetExecutor, src/mongo/db/pipeline/pipeline_d.cpp to fail.

To fix it, I have capped the skip value with limit value. Basically it means that we can skip the desired amount of documents but no more than the limit stage returned to us.

Now the patch passes the performance test, but the regression described above is still in place (see Aggregation.SortProjectWithBigDocuments in this run).

As david.storch pointed out, it is not obvious if we should always choose PROJECT => SORT => SKIP over SORT => SKIP => PROJECT. Now we always do SORT => SKIP => PROJECT which kills the performance on benchmark because big documents are sent to the SORT.

My suggestion is to prefer PROJECT => SORT => SKIP plan when the PROJECT is simple. This will fix the regression and seems to be sane assumption. What are your thoughts about that?

BTW, if this patch is to be merged into master, mongo-perf workloads will need to be refreshed. As charlie.swanson  said, currently some workloads add $skip to the end of aggregation pipeline to prevent BSON serialisation. With the patch applied this skip is going to be pushed down to the query stage and nothing will be returned to pipeline to process.

Comment by David Storch [ 16/Jan/20 ]

charlie.swanson, to answer your question more directly, pushdown of the $skip inhibits the optimization from SERVER-15200 because we currently add the SKIP PlanStage to the tree prior to the projection. See this planner analysis code:

https://github.com/mongodb/mongo/blob/bb6e5391a9591139804038b3524a1b92ad1c327b/src/mongo/db/query/planner_analysis.cpp#L862-L890

The project-sort swapping optimization from SERVER-15200 only kicks in if the tree has a very particular structure, where the projection is directly above the sort. The presence of the skip node causes us to bail out of the optimization. As Ian said, it's probably a good idea to skip before projecting, to avoid computing the projection over documents that are eventually skipped. It's not obvious a priori whether the plan should be PROJECT => SORT => SKIP or SORT => SKIP => PROJECT. Currently, we choose the latter.

Comment by Ian Boros [ 13/Jan/20 ]

charlie.swanson

I could imagine that for cases where the projection is expensive to compute, pushing it beneath the skip means that it will be applied to documents that ultimately get skipped. We had to think about this for Top-K sort during SERVER-15200 (similar case) and intentionally did not apply the optimization.

That said, I could see a case for pushing the projection beneath the skip -> sort -> sortkey gen as long as the projection does not contain computed fields (which could cause the document to get bigger).

Generated at Thu Feb 08 04:56:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.