[SERVER-44309] Not full index use in aggregation pipelines Created: 30/Oct/19  Updated: 10/Feb/20  Resolved: 06/Nov/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Index Maintenance
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Staffan Eketorp Assignee: Carl Champain (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
duplicates SERVER-36723 $limit should push down into the Plan... Closed
Related
is related to SERVER-7568 Aggregation framework favors non-bloc... Closed
is related to SERVER-36723 $limit should push down into the Plan... Closed
Participants:

 Description   

I could be wrong, but it seems to me that when running aggregation pipelines, indexes aren't used as well as for find queries with sort/limit. Example:

I have documents like this:

_id: ObjectId(...),
id: "someid",
status: "published",
version: 123123123,
foo: "bar"

Then there's an index for 

{id: 1, status: 1, version: 1} 

If I run

 myCollection.find({id:'someid', status:'published'}).sort({version: -1}).limit(1).explain()

...I get the following inputStage indicating that this is done without document fetching.

"inputStage" :
{ 
"stage" : "SORT", 
"sortPattern" : { "id" : 1.0, "status" : 1.0, "version" : -1.0 }, 
 "limitAmount" : 1.0, 
 "inputStage" : {
 "stage" : "SORT_KEY_GENERATOR", 
 "inputStage" :
{ 
"stage" : "IXSCAN", 
"keyPattern" : { "id" : 1.0, "status" : 1.0, "version" : 1.0 }, 
 "indexName" : "contentId_1_status_1_version_1", 
 "isMultiKey" : false, 
 "multiKeyPaths" : \{ "id" : [ ], "status" : [ ], "version" : [ ] }, 
 "isUnique" : false, 
 "isSparse" : false, 
 "isPartial" : false, 
 "indexVersion" : 2.0, 
 "direction" : "forward", 
 "indexBounds" : { "id" : [ "[\"someid\", \"someid\"]" ], "status" : [ "[\"published\", \"published\"]" ], "version" : [ "[MinKey, MaxKey]" ] }
 }
 }
 }
 

 However, If I do the same with aggregation pipeline. I.e. something like

 myCollection.explain().aggregate([
 
 {$match: {id: 'content12', status:'published'}},
 
 {$sort: {id: 1, status: 1, version: -1} },
 
 {$limit: 1}])
 

 ....then it seems like it's fetching more. I get 

"inputStage" :
{ 
"stage" : "IXSCAN", 
"keyPattern" : { "contentId" : 1.0, "status" : 1.0, "version" : 1.0 }, 
 "indexName" : "contentId_1_status_1_version_1", 
 "isMultiKey" : false, 
 "multiKeyPaths" : \{ "contentId" : [ ], "status" : [ ], "version" : [ ] }, 
 "isUnique" : false, 
 "isSparse" : false, 
 "isPartial" : false, 
 "indexVersion" : 2.0, 
 "direction" : "forward", 
 "indexBounds" : { "contentId" : [ "[\"content12\", \"content12\"]" ], 
"status" : [ "[\"published\", \"published\"]" ], 
"version" : [ "[MinKey, MaxKey]" ] }
 }
 

 This seems like a miss. Shouldn't an aggregation pipeline like that be able to use the same index handling?



 Comments   
Comment by Kelsey Schubert [ 10/Feb/20 ]

staffan.eketorp@gmail.com, 4.3.3, which is our development release and not intended for production is available for download now: https://www.mongodb.com/download-center

I would expect MongoDB 4.4.0, which will be our production release containing these improvements, to be generally available later this year.

Comment by Staffan Eketorp [ 06/Nov/19 ]

Thanks @carl.champain Any idea on when 4.3 will be released?

Comment by Carl Champain (Inactive) [ 06/Nov/19 ]

Hi staffan.eketorp@gmail.com,

We were able to successfully recreate a similar behavior, and it appears this was indeed a bug which was fixed by the combination of SERVER-36723 and SERVER-7568.
These changes cause $limit and $sort stages respectively to be pushed down from the aggregation layer to the query layer, which should avoid the extra document fetches.

Due to SERVER-36723 and SERVER-7568, this behavior does not appear in our master branch. Unfortunately we will not be able to backport the work to 4.2.

Thanks again for taking the time to submit this report. I will now close this ticket.

Kind regards,
Carl

Comment by Staffan Eketorp [ 01/Nov/19 ]

And for the "normal" .find().sort().limit():

{
   "queryPlanner":{
      "plannerVersion":1.0,
      "namespace":"test.contents",
      "indexFilterSet":false,
      "parsedQuery":{
         "$and":[
            {
               "contentId":{
                  "$eq":"content12"
               }
            },
            {
               "status":{
                  "$eq":"published"
               }
            }
         ]
      },
      "winningPlan":{
         "stage":"FETCH",
         "inputStage":{
            "stage":"SORT",
            "sortPattern":{
               "contentId":1.0,
               "status":1.0,
               "version":-1.0
            },
            "limitAmount":1.0,
            "inputStage":{
               "stage":"SORT_KEY_GENERATOR",
               "inputStage":{
                  "stage":"IXSCAN",
                  "keyPattern":{
                     "contentId":1.0,
                     "status":1.0,
                     "version":1.0
                  },
                  "indexName":"contentId_1_status_1_version_1",
                  "isMultiKey":false,
                  "multiKeyPaths":{
                     "contentId":[
 
                     ],
                     "status":[
 
                     ],
                     "version":[
 
                     ]
                  },
                  "isUnique":false,
                  "isSparse":false,
                  "isPartial":false,
                  "indexVersion":2.0,
                  "direction":"forward",
                  "indexBounds":{
                     "contentId":[
                        "[\"content12\", \"content12\"]"
                     ],
                     "status":[
                        "[\"published\", \"published\"]"
                     ],
                     "version":[
                        "[MinKey, MaxKey]"
                     ]
                  }
               }
            }
         }
      },
      "rejectedPlans":[
 
      ]
   },
   "executionStats":{
      "executionSuccess":true,
      "nReturned":1.0,
      "executionTimeMillis":0.0,
      "totalKeysExamined":3.0,
      "totalDocsExamined":1.0,
      "executionStages":{
         "stage":"FETCH",
         "nReturned":1.0,
         "executionTimeMillisEstimate":0.0,
         "works":7.0,
         "advanced":1.0,
         "needTime":5.0,
         "needYield":0.0,
         "saveState":0.0,
         "restoreState":0.0,
         "isEOF":1.0,
         "docsExamined":1.0,
         "alreadyHasObj":0.0,
         "inputStage":{
            "stage":"SORT",
            "nReturned":1.0,
            "executionTimeMillisEstimate":0.0,
            "works":6.0,
            "advanced":1.0,
            "needTime":5.0,
            "needYield":0.0,
            "saveState":0.0,
            "restoreState":0.0,
            "isEOF":1.0,
            "sortPattern":{
               "contentId":1.0,
               "status":1.0,
               "version":-1.0
            },
            "memUsage":55.0,
            "memLimit":33554432.0,
            "limitAmount":1.0,
            "inputStage":{
               "stage":"SORT_KEY_GENERATOR",
               "nReturned":3.0,
               "executionTimeMillisEstimate":0.0,
               "works":5.0,
               "advanced":3.0,
               "needTime":1.0,
               "needYield":0.0,
               "saveState":0.0,
               "restoreState":0.0,
               "isEOF":1.0,
               "inputStage":{
                  "stage":"IXSCAN",
                  "nReturned":3.0,
                  "executionTimeMillisEstimate":0.0,
                  "works":4.0,
                  "advanced":3.0,
                  "needTime":0.0,
                  "needYield":0.0,
                  "saveState":0.0,
                  "restoreState":0.0,
                  "isEOF":1.0,
                  "keyPattern":{
                     "contentId":1.0,
                     "status":1.0,
                     "version":1.0
                  },
                  "indexName":"contentId_1_status_1_version_1",
                  "isMultiKey":false,
                  "multiKeyPaths":{
                     "contentId":[
 
                     ],
                     "status":[
 
                     ],
                     "version":[
 
                     ]
                  },
                  "isUnique":false,
                  "isSparse":false,
                  "isPartial":false,
                  "indexVersion":2.0,
                  "direction":"forward",
                  "indexBounds":{
                     "contentId":[
                        "[\"content12\", \"content12\"]"
                     ],
                     "status":[
                        "[\"published\", \"published\"]"
                     ],
                     "version":[
                        "[MinKey, MaxKey]"
                     ]
                  },
                  "keysExamined":3.0,
                  "seeks":1.0,
                  "dupsTested":0.0,
                  "dupsDropped":0.0
               }
            }
         }
      }
   },
   "serverInfo":{
      "host":"b282af9a7e3e",
      "port":27017.0,
      "version":"4.2.1",
      "gitVersion":"edf6d45851c0b9ee15548f0f847df141764a317e"
   },
   "ok":1.0
}

Comment by Staffan Eketorp [ 01/Nov/19 ]

For the aggregation pipeline (and mind you the "id" field is really called "contentId"), and the query param is "content12" rather than "someid"

{
   "stages":[
      {
         "$cursor":{
            "query":{
               "contentId":"content12",
               "status":"published"
            },
            "queryPlanner":{
               "plannerVersion":1.0,
               "namespace":"test.contents",
               "indexFilterSet":false,
               "parsedQuery":{
                  "$and":[
                     {
                        "contentId":{
                           "$eq":"content12"
                        }
                     },
                     {
                        "status":{
                           "$eq":"published"
                        }
                     }
                  ]
               },
               "queryHash":"72C191F2",
               "planCacheKey":"4983A7BF",
               "winningPlan":{
                  "stage":"FETCH",
                  "inputStage":{
                     "stage":"IXSCAN",
                     "keyPattern":{
                        "contentId":1.0,
                        "status":1.0,
                        "version":1.0
                     },
                     "indexName":"contentId_1_status_1_version_1",
                     "isMultiKey":false,
                     "multiKeyPaths":{
                        "contentId":[
 
                        ],
                        "status":[
 
                        ],
                        "version":[
 
                        ]
                     },
                     "isUnique":false,
                     "isSparse":false,
                     "isPartial":false,
                     "indexVersion":2.0,
                     "direction":"forward",
                     "indexBounds":{
                        "contentId":[
                           "[\"content12\", \"content12\"]"
                        ],
                        "status":[
                           "[\"published\", \"published\"]"
                        ],
                        "version":[
                           "[MinKey, MaxKey]"
                        ]
                     }
                  }
               },
               "rejectedPlans":[
 
               ]
            },
            "executionStats":{
               "executionSuccess":true,
               "nReturned":3.0,
               "executionTimeMillis":1.0,
               "totalKeysExamined":3.0,
               "totalDocsExamined":3.0,
               "executionStages":{
                  "stage":"FETCH",
                  "nReturned":3.0,
                  "executionTimeMillisEstimate":0.0,
                  "works":4.0,
                  "advanced":3.0,
                  "needTime":0.0,
                  "needYield":0.0,
                  "saveState":1.0,
                  "restoreState":1.0,
                  "isEOF":1.0,
                  "docsExamined":3.0,
                  "alreadyHasObj":0.0,
                  "inputStage":{
                     "stage":"IXSCAN",
                     "nReturned":3.0,
                     "executionTimeMillisEstimate":0.0,
                     "works":4.0,
                     "advanced":3.0,
                     "needTime":0.0,
                     "needYield":0.0,
                     "saveState":1.0,
                     "restoreState":1.0,
                     "isEOF":1.0,
                     "keyPattern":{
                        "contentId":1.0,
                        "status":1.0,
                        "version":1.0
                     },
                     "indexName":"contentId_1_status_1_version_1",
                     "isMultiKey":false,
                     "multiKeyPaths":{
                        "contentId":[
 
                        ],
                        "status":[
 
                        ],
                        "version":[
 
                        ]
                     },
                     "isUnique":false,
                     "isSparse":false,
                     "isPartial":false,
                     "indexVersion":2.0,
                     "direction":"forward",
                     "indexBounds":{
                        "contentId":[
                           "[\"content12\", \"content12\"]"
                        ],
                        "status":[
                           "[\"published\", \"published\"]"
                        ],
                        "version":[
                           "[MinKey, MaxKey]"
                        ]
                     },
                     "keysExamined":3.0,
                     "seeks":1.0,
                     "dupsTested":0.0,
                     "dupsDropped":0.0
                  }
               }
            }
         }
      },
      {
         "$sort":{
            "sortKey":{
               "contentId":1.0,
               "status":1.0,
               "version":-1.0
            },
            "limit":NumberLong(1)
         }
      }
   ],
   "ok":1.0
}

Comment by Carl Champain (Inactive) [ 01/Nov/19 ]

Hi staffan.eketorp@gmail.com,

Thanks for the report.
To help us look more into this behavior, could you please provide the full outputs of explain('executionStats') for both queries?

Kind regards,
Carl

Comment by Staffan Eketorp [ 31/Oct/19 ]

Clarification: The last example was from using documents with "contentId" property rather than "id". Also "content12" was used instead of "someid". 

Generated at Thu Feb 08 05:05:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.