[SERVER-27250] Aggregation Pipeline Favors Indexes for $sort phase over $match phase Created: 01/Dec/16  Updated: 07/Apr/23  Resolved: 01/Dec/16

Status: Closed
Project: Core Server
Component/s: Aggregation Framework
Affects Version/s: 3.0.7, 3.2.11, 3.4.0
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jason Ford Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
Related
Operating System: ALL
Steps To Reproduce:

Take a sample collection, called people, made up of documents that look like this:

{
     _id : ObjectId("507f191e810c19729de860ea"),
     name: "Jason",
     status: "Good"
}

Let's also say that we have a single, ascending index on each of the three fields in this collection.

If I were to run the following aggregate query on this collection:

db.people.aggregate(
[
    {
                "$match" : {
                    "name" : "Jason"
                }
            },
            {
                "$sort" : {
                    "status" : 1
                }
            }
]
)

It will use the index on

{status:1}

, instead of the more efficient index on

{name:1}

. Removing the $sort stage will cause it to use the name index, as will adding an additional sort stage inbetween the $match and existing $sort stage. I have confirmed this by adding

{explain:true}

to the above query, getting this result:

 
	"winningPlan" : {
						"stage" : "FETCH",
						"filter" : {
							"name" : {
								"$eq" : "Jason"
							}
						},
						"inputStage" : {
							"stage" : "IXSCAN",
							"keyPattern" : {
								"status" : 1
							},
							"indexName" : "status_1",
							"isMultiKey" : false,
							"direction" : "forward",
							"indexBounds" : {
								"status" : [
									"[MinKey, MaxKey]"
								]
							}
						}
					}

Participants:

 Description   

When running an aggregation query with a $match and $sort phase against a collection where both the field being queried in the $match step and the $sort step have separate indexes, mongo will always choose to use the index that assists with the $sort, and not the $match. This leads to inefficient collection scan queries!

We came across this bug after upgrading from the 1.X to the 2.X C# mongo driver, but can reproduce it easily in the shell. I have tested this in mongoDB version 3.0 on MMAPv1, as well as 3.2 and 3.4 clusters in Atlas running Wiredtiger, with identical results.

It is easy to reproduce, I have detailed the steps below with a very simple example.



 Comments   
Comment by Kelsey Schubert [ 01/Dec/16 ]

Hi fordjp,

Thanks for the detailed report. This issue is tracked in SERVER-7568. Please review this ticket and watch it for updates. If this is an issue for you, I would recommend following Andrey's workaround, which works for now.

Kind regards,
Thomas

Generated at Thu Feb 08 04:14:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.