[SERVER-7568] Aggregation framework favors non-blocking sorts Created: 06/Nov/12  Updated: 08/Feb/23  Resolved: 22/Oct/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: David Schneider Assignee: David Storch
Resolution: Done Votes: 25
Labels: nh-240, performance, usability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-26442 Push $sort before $project and $addFi... Open
Duplicate
is duplicated by SERVER-27250 Aggregation Pipeline Favors Indexes f... Closed
is duplicated by SERVER-21471 Bad index selection on aggregate with... Closed
is duplicated by SERVER-26337 performance problem with $limit in ag... Closed
is duplicated by SERVER-27076 Query Planner uses inefficient plan f... Closed
is duplicated by SERVER-27150 Performance degradation with $sort on... Closed
is duplicated by SERVER-30696 $sort in aggregate is super slow Closed
is duplicated by SERVER-32186 Query Planner selects wrong index Closed
is duplicated by SERVER-44309 Not full index use in aggregation pip... Closed
is duplicated by SERVER-43221 Applying index makes aggregation quer... Closed
Problem/Incident
Related
related to SERVER-21783 aggregation framework indexed date(Is... Closed
related to SERVER-43816 Push $text and $sort with $meta in th... Closed
related to SERVER-44309 Not full index use in aggregation pip... Closed
is related to SERVER-31051 Count Is very very slow using aggrega... Closed
is related to SERVER-7944 add index hint support for operations... Closed
is related to SERVER-24860 Optimize away entire pipeline if it c... Closed
Backwards Compatibility: Fully Compatible
Sprint: Query 2017-05-08, Query 2017-05-29, Query 2019-10-07, Query 2019-10-21, Query 2019-11-04
Participants:
Case:
Linked BF Score: 15

 Description   

In some situations, an aggregation pipeline with a sort will attempt to use an index to cover the sort. In such situations, the aggregation pipeline will always favor plans that do not have a blocking sort, even if other plans may be faster.

We should somehow integrate the aggregation framework and the query system more tightly so that we can consider both blocking and non-blocking plans at the same time, or we should try to otherwise determine if the blocking plan would outperform the non-blocking plan.

Original Description

We have a dataset with some hundred thousand documents with aggregation framework we select a small amount of it and sort it by _id. Match works very fast (by index) but sort {_id: 1} seems to load all data instead of sorting only selected documents. If we sort by some other field it works like expected.

Example

function test(num) {
    function time(fun) {
        var start = (new Date).getTime();
        fun.apply();
        return (new Date).getTime() - start;
    }
 
    if (num === undefined) {
        num = 1000;
    }
    db.demo.remove();
    for (var i = 0; i < num; ++i) {
        db.demo.insert({num: i});
    }
    db.demo.ensureIndex({num: 1});
    var query = {num: {$lt: 10}};
    var sortId = {_id: 1};
    var sortNum = {num: 1};
    printjson(db.demo.find(query).explain());
    print("find sort by id: " + time(function ()
    {
        db.demo.find(query).sort(sortId);
    }));
    print("aggregate sort by id: " + time(function ()
    {
        db.demo.aggregate([{$match: query}, {$sort: sortId}]);
    }));
    print("find sort by num: " + time(function ()
    {
        db.demo.find(query).sort(sortNum);
    }));
 
    print("aggregate sort by num: " + time(function ()
    {
        db.demo.aggregate([{$match: query}, {$sort: sortNum}]);
    }));
}

test(100000)
{
	"cursor" : "BtreeCursor num_1",
	"isMultiKey" : false,
	"n" : 10,
	"nscannedObjects" : 10,
	"nscanned" : 10,
	"nscannedObjectsAllPlans" : 10,
	"nscannedAllPlans" : 10,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"num" : [
			[
				-1.7976931348623157e+308,
				10
			]
		]
	},
	"server" : "mongodb01:27228",
	"millis" : 0
}
find sort by id: 0
aggregate sort by id: 373
find sort by num: 0
aggregate sort by num: 1



 Comments   
Comment by Graeme Downes [ 12/Nov/19 ]

Can you confirm by implementing the $project workaround we will lose the benefits of indexed sorts?

 As per the docs: 

$sort operator can take advantage of an index when placed at the beginning of the pipeline or placed before the $project, $unwind, and $group aggregation operators. If $project, $unwind, or $group occur prior to the $sort operation, $sort cannot use any indexes.

https://docs.mongodb.com/manual/reference/operator/aggregation/sort/#sort-operator-and-performance

Thanks

Comment by David Storch [ 22/Oct/19 ]

This issue is fixed as of commit 8d048a3bb2. Users should no longer need to work around the problem with hints or index filters, or by rewriting queries. The fix will first become available in development release 4.3.1.

Comment by Githook User [ 22/Oct/19 ]

Author:

{'name': 'David Storch', 'username': 'dstorch', 'email': 'david.storch@mongodb.com'}

Message: SERVER-7568 Push $sort into PlanStage layer even for blocking SORT plans.

This change results in the multi-planning mechanism
evaluating both non-blocking and blocking plans for the
$sort when possible. The system should no longer select a
non-blocking plan when a plan with a SORT stage is superior.
Branch: master
https://github.com/mongodb/mongo/commit/8d048a3bb2f0f2f81cf99ce76ff21112bf3963d6

Comment by Tomasz Fidos [ 03/Jun/19 ]

Thanks

Comment by Asya Kamsky [ 01/Jun/19 ]

tomfid

The workaround is to use a hint to specify the index that will satisfy the $match.

This has not been fixed - when it is, the status and resolution fields will be updated accordingly.

Asya

Comment by Tomasz Fidos [ 27/May/19 ]

Vote up and:

1) Is this workaround: putting $project in-between $match and $sort, now an official solution? Why is it working?

2) Is this fixed?

Comment by Asya Kamsky [ 18/Oct/17 ]

Note that as of 3.6 aggregation can take a "hint" option to specify desired index to use.

Comment by Charlie Swanson [ 20/Nov/15 ]

I've updated the description to reflect the underlying problem a little more succinctly.

Comment by Daniel Pasette (Inactive) [ 22/Jun/15 ]

This is a non-trivial issue to solve. The agg pipeline optimizer asks the query planner for a plan which does not include a blocking sort, when that is in fact what the user desires in this case. The workaround of inserting a $project before the $sort mentioned by andrey.hohutkin@gmail.com does work for now.

Comment by yosi oren [ 27/Jan/15 ]

vote up

Comment by Andrey Hohutkin [ 27/Jan/15 ]

It happens not only in aggregation. It causes slowness on every query operation like find, findAndModify.
I temporary resolved this problem putting $project between $match and $sort.
I succeeded to resolve find with $hint.
But findAndModify I cant resolve.
Anyway it is very important to fix it because it causes extreme slowness.
Voting up!

Comment by Thomas Rueckstiess [ 07/Apr/14 ]

The test ist not quite fair, because the query is always on num, but the sort is either on _id or on num. So we have:

1.   query: {num: {$lt: 10}}   sort: {_id: 1}
2.   query: {num: {$lt: 10}}   sort: {num: 1}

And the output of the test script is something like:

find sort by id: 0ms
aggregate sort by id: 205ms
find sort by num: 0ms
aggregate sort by num: 1ms

If we switch the query to be on _id instead (and make sure that some documents match, by setting the _id value to i in the insert as well):

1.   query: {_id: {$lt: 10}}   sort: {_id: 1}
2.   query: {_id: {$lt: 10}}   sort: {num: 1}

We get the exact opposite result, with the sort on num being slow in the aggregation:

find sort by id: 0ms
aggregate sort by id: 1ms
find sort by num: 1ms
aggregate sort by num: 466ms

This is because of the choice of indexes. For the original test script, the aggregation framework chooses the _id index for the slow case, and has to scan through the entire collection instead of quickly finding the 10 matches and sorting them in memory with the num index (that's what the find() does).

That is the actual question here: Why does the aggregation framework choose the much less efficient index for the combined query/sort? To show this, I ran both find and agg on 2.6 and made use of the explain parameter in the aggregation framework.

Here is the modified test script:

function test(num) {
    function time(fun) {
        var start = (new Date).getTime();
        fun.apply();
        return (new Date).getTime() - start;
    }
  
    if (num === undefined) {
        num = 1000;
    }
    
    db.demo.remove({});
    for (var i = 0; i < num; ++i) {
        db.demo.insert({_id: i, num: i});
    }
    db.demo.ensureIndex({num: 1});
    db.demo.ensureIndex({other: 1});
    
    var query = {num: {$lt: 10}};
    var sortId = {_id: 1};
 
    // find: query on num, sort on _id (fast)
    print("find: query on num, sort by _id: " + time(function ()
    {
        db.demo.find(query).sort(sortId).itcount();
    }) + "ms");
    printjson(db.demo.find(query).sort(sortId).explain());
 
    // aggregate: query on num, sort on _id (slow!!!)
    print("aggregate: query on num, sort by _id: " + time(function ()
    {
        db.demo.aggregate([{$match: query}, {$sort: sortId}]);
    }) + "ms");
    printjson(db.demo.aggregate([{$match: query}, {$sort: sortId}], {explain: true}));
}
 
test(100000);

And here is the explain output from find and aggregate:

find: query on num, sort by _id: 1ms
{
	"cursor" : "BtreeCursor num_1",
	"isMultiKey" : false,
	"n" : 10,
	"nscannedObjects" : 10,
	"nscanned" : 10,
	"nscannedObjectsAllPlans" : 43,
	"nscannedAllPlans" : 44,
	"scanAndOrder" : true,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"num" : [
			[
				-Infinity,
				10
			]
		]
	},
	"server" : "enter.local:27017",
	"filterSet" : false
}
aggregate: query on num, sort by _id: 122ms
{
	"stages" : [
		{
			"$cursor" : {
				"query" : {
					"num" : {
						"$lt" : 10
					}
				},
				"sort" : {
					"_id" : 1
				},
				"plan" : {
					"cursor" : "BtreeCursor _id_",
					"isMultiKey" : false,
					"scanAndOrder" : false,
					"indexBounds" : {
						"_id" : [
							[
								{
									"$minElement" : 1
								},
								{
									"$maxElement" : 1
								}
							]
						]
					},
					"allPlans" : [
						{
							"cursor" : "BtreeCursor _id_",
							"isMultiKey" : false,
							"scanAndOrder" : false,
							"indexBounds" : {
								"_id" : [
									[
										{
											"$minElement" : 1
										},
										{
											"$maxElement" : 1
										}
									]
								]
							}
						}
					]
				}
			}
		}
	],
	"ok" : 1
}

redbeard0531, is this expected behavior or should the aggregation framework also choose the fast index on num here?

Comment by David Schneider [ 07/Nov/12 ]

I added the itcount() call to the cursor. But still get the same result. Anyway the find query was just to compare. For me to ~400ms execution time for aggregation with sort by _id comparing to nearly nothing in the other cases raises concern.

function test(num) {
    function time(fun) {
        var start = (new Date).getTime();
        fun.apply();
        return (new Date).getTime() - start;
    }
 
    if (num === undefined) {
        num = 1000;
    }
    db.demo.remove();
    for (var i = 0; i < num; ++i) {
        db.demo.insert({num: i});
    }
    db.demo.ensureIndex({num: 1});
    var query = {num: {$lt: 10}};
    var sortId = {_id: 1};
    var sortNum = {num: 1};
    printjson(db.demo.find(query).explain());
    print("find sort by id: " + time(function ()
    {
        db.demo.find(query).sort(sortId).itcount();
    }) + "ms");
    print("aggregate sort by id: " + time(function ()
    {
        db.demo.aggregate([{$match: query}, {$sort: sortId}]);
    }) + "ms");
    print("find sort by num: " + time(function ()
    {
        db.demo.find(query).sort(sortNum).itcount();
    }) + "ms");
 
    print("aggregate sort by num: " + time(function ()
    {
        db.demo.aggregate([{$match: query}, {$sort: sortNum}]);
    }) + "ms");
}

test(100000)
{
	"cursor" : "BtreeCursor num_1",
	"isMultiKey" : false,
	"n" : 10,
	"nscannedObjects" : 10,
	"nscanned" : 10,
	"nscannedObjectsAllPlans" : 10,
	"nscannedAllPlans" : 10,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"num" : [
			[
				-1.7976931348623157e+308,
				10
			]
		]
	},
	"server" : "mongodb01:27228",
	"millis" : 0
}
find sort by id: 2ms
aggregate sort by id: 397ms
find sort by num: 1ms
aggregate sort by num: 1ms

Comment by Aaron Staple [ 06/Nov/12 ]

When you put something like this in a script it does not actually touch the server, it just builds a cursor:

db.demo.find(query).sort(sortId);

You would need to do something like this to ensure the query runs:

db.demo.find(query).sort(sortId).itcount();

Generated at Thu Feb 08 03:14:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.