[SERVER-21471] Bad index selection on aggregate with $match + $sort Created: 16/Nov/15  Updated: 18/Oct/16  Resolved: 20/Nov/15

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Mark Zealey Assignee: Charlie Swanson
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
Related
is related to SERVER-21502 $group using index Closed
Operating System: ALL
Participants:

 Description   

I have a collection with 2m records. unique index on _id and foo columns

db.col.aggregate([ { "$match" : { "foo" : "abcd" } }, { "$sort" : { "_id" : 1 } } ])

takes about 5 seconds to return;

db.col.aggregate([ { "$match" : { "foo" : "abcd" } } ])

returns instantly. An explain shows that the first one chooses to use the _id index for some reason therefore causing a whole table scan. I seem to recall this also being an issue back in mongo 2.4 but is still there in mongo 3



 Comments   
Comment by Charlie Swanson [ 20/Nov/15 ]

Actually, I've found an issue already filed, which this duplicates, so I will close this as a duplicate. Please watch the other ticket for updates. Thanks!

As for hinting support, see SERVER-7944.

Comment by Mark Zealey [ 20/Nov/15 ]

Perhaps if this is tricky to fix you could add a way of specifying an index hint to a pipeline, or a pipeline operator that would do as you suggest with $project but basically stop the optimizer from looking at future stages after it?

Comment by Mark Zealey [ 20/Nov/15 ]

Yup the equivalent find() pipeline works just fine; it's just on the aggregation pipeline that we have this issue. Thanks, Mark

Comment by Charlie Swanson [ 20/Nov/15 ]

Thanks for the report mark.zealey.

This is a known issue, and is an artifact of the way the aggregation system integrates with the query system. Currently the aggregation system will always prefer plans that do not have a blocking sort stage, if there are any available.

I'm fairly confident this is what's happening, but can you also provide the following additional information, just to be sure:

  • Does this happen only with the aggregation pipeline, or does it also happen with the equivalent .find() operation?
  • If it happens with .find(), does it happen with .explain().find()?

If this is indeed the problem, there are a couple options I can think of to work around it:

  • Remove the sort from your pipeline (if you can)
  • Use the equivalent .find() operation (if you can)
  • If you know what fields you need, you can add a $project stage between the $match and the $sort (e.g. {$project: {_id: 1, id: 1}}). I will not guarantee this will continue to work in future versions, since it relies on the fact that our optimizations aren't smart enough to move the sort before the project, but it should work for now.

That's about all I can think of for now, but there may be other workarounds.

If you confirm this is what's happening, I can convert this ticket into a feature request to better optimize in situations like this.

Comment by Mark Zealey [ 20/Nov/15 ]

This is still an issue in your 3.2 apt repository build at least:

db version v3.2.0-rc3-75-g5b3257d
git version: 5b3257d526f8217e303609418da1769275f81d03
OpenSSL version: OpenSSL 1.0.1e 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: debian71
distarch: x86_64
target_arch: x86_64

.aggregate([ { "$match" : { "id" : "abcd" } }, { "$sort" : { "_id" : 1 } } ], {explain:true})

...
                                "queryPlanner" : {
                                        "plannerVersion" : 1,
                                        "namespace" : "test.col",
                                        "indexFilterSet" : false,
                                        "parsedQuery" : {
                                                "id" : {
                                                        "$eq" : "abcd"
                                                }
                                        },
                                        "winningPlan" : {
                                                "stage" : "FETCH",
                                                "filter" : {
                                                        "id" : {
                                                                "$eq" : "abcd"
                                                        }
                                                },
                                                "inputStage" : {
                                                        "stage" : "IXSCAN",
                                                        "keyPattern" : {
                                                                "_id" : 1
                                                        },
                                                        "indexName" : "_id_",
                                                        "isMultiKey" : false,
                                                        "isUnique" : true,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 1,
                                                        "direction" : "forward",
                                                        "indexBounds" : {
                                                                "_id" : [
                                                                        "[MinKey, MaxKey]"
                                                                ]
                                                        }
                                                }
                                        },
                                        "rejectedPlans" : [ ]
                                }

with the sort being on id or the sort being removed, this changes to a direct btree index search:

                                       "winningPlan" : {
                                                "stage" : "FETCH",
                                                "inputStage" : {
                                                        "stage" : "IXSCAN",
                                                        "keyPattern" : {
                                                                "id" : 1
                                                        },
                                                        "indexName" : "id_1",
                                                        "isMultiKey" : false,
                                                        "isUnique" : true,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 1,
                                                        "direction" : "forward",
                                                        "indexBounds" : {
                                                                "id" : [
                                                                        "[\"abcd\", \"abcd\"]"
                                                                ]
                                                        }
                                                }
                                        },

Comment by Charlie Swanson [ 16/Nov/15 ]

Hi Mark,

If you wouldn't mind, could you try reproducing this on a 3.2 release candidate? I'm not sure, but I believe SERVER-12015 may help here. If not, can you please include which indices you have so I can investigate further?

Generated at Thu Feb 08 03:57:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.