Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-27250

Aggregation Pipeline Favors Indexes for $sort phase over $match phase

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Critical - P2 Critical - P2
    • None
    • Affects Version/s: 3.0.7, 3.2.11, 3.4.0
    • Component/s: Aggregation Framework
    • Labels:
      None
    • ALL
    • Hide

      Take a sample collection, called people, made up of documents that look like this:

      {
           _id : ObjectId("507f191e810c19729de860ea"),
           name: "Jason",
           status: "Good"
      }
      

      Let's also say that we have a single, ascending index on each of the three fields in this collection.

      If I were to run the following aggregate query on this collection:

      db.people.aggregate(
      [
          {
                      "$match" : {
                          "name" : "Jason"
                      }
                  },
                  {
                      "$sort" : {
                          "status" : 1
                      }
                  }
      ]
      )
      
      

      It will use the index on

      {status:1}

      , instead of the more efficient index on

      {name:1}

      . Removing the $sort stage will cause it to use the name index, as will adding an additional sort stage inbetween the $match and existing $sort stage. I have confirmed this by adding

      {explain:true}

      to the above query, getting this result:

      
      	"winningPlan" : {
      						"stage" : "FETCH",
      						"filter" : {
      							"name" : {
      								"$eq" : "Jason"
      							}
      						},
      						"inputStage" : {
      							"stage" : "IXSCAN",
      							"keyPattern" : {
      								"status" : 1
      							},
      							"indexName" : "status_1",
      							"isMultiKey" : false,
      							"direction" : "forward",
      							"indexBounds" : {
      								"status" : [
      									"[MinKey, MaxKey]"
      								]
      							}
      						}
      					}
      
      Show
      Take a sample collection, called people, made up of documents that look like this: { _id : ObjectId( "507f191e810c19729de860ea" ), name: "Jason" , status: "Good" } Let's also say that we have a single, ascending index on each of the three fields in this collection. If I were to run the following aggregate query on this collection: db.people.aggregate( [ { "$match" : { "name" : "Jason" } }, { "$sort" : { "status" : 1 } } ] ) It will use the index on {status:1} , instead of the more efficient index on {name:1} . Removing the $sort stage will cause it to use the name index, as will adding an additional sort stage inbetween the $match and existing $sort stage. I have confirmed this by adding {explain:true} to the above query, getting this result: "winningPlan" : { "stage" : "FETCH" , "filter" : { "name" : { "$eq" : "Jason" } }, "inputStage" : { "stage" : "IXSCAN" , "keyPattern" : { "status" : 1 }, "indexName" : "status_1" , "isMultiKey" : false , "direction" : "forward" , "indexBounds" : { "status" : [ "[MinKey, MaxKey]" ] } } }

      When running an aggregation query with a $match and $sort phase against a collection where both the field being queried in the $match step and the $sort step have separate indexes, mongo will always choose to use the index that assists with the $sort, and not the $match. This leads to inefficient collection scan queries!

      We came across this bug after upgrading from the 1.X to the 2.X C# mongo driver, but can reproduce it easily in the shell. I have tested this in mongoDB version 3.0 on MMAPv1, as well as 3.2 and 3.4 clusters in Atlas running Wiredtiger, with identical results.

      It is easy to reproduce, I have detailed the steps below with a very simple example.

            Assignee:
            Unassigned Unassigned
            Reporter:
            fordjp@gmail.com Jason Ford
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: