[SERVER-43221] Applying index makes aggregation query go significantly slower Created: 08/Sep/19  Updated: 09/Sep/19  Resolved: 09/Sep/19

Status: Closed
Project: Core Server
Component/s: Aggregation Framework, Index Maintenance, Performance
Affects Version/s: 3.4.22
Fix Version/s: None

Type: Question Priority: Minor - P4
Reporter: Benjamin Deany Assignee: Carl Champain (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-7568 Aggregation framework favors non-bloc... Closed
Participants:

 Description   

I am not sure if this user error, a design choice (or flaw), or a bug. We encountered an issue where, after an index was applied, an aggregation query (which was previously running fine; slow-ish, but fine), spiked CPU to 100% and became very, very slow. We estimate an approximate 10x to 20x slow-down.

Note that we were able to modify the query to cause a new query plan to be generated which resolved the issue.

The query in question is:

 

Query

db.GPSLocations.aggregate([ { $match: { organisation:

Unknown macro: { $in}

, date: { $gte: new Date(1560181081336) } } }, { $sort:

Unknown macro: { owner}

}, { $group: { _id: "$ownerId", last:

Unknown macro: { $last}

} } ]);

 

This query gets the last known GPS location for a user.

 

Pre-Index Query Plan

{
   "stages":[
      {
         "$cursor":{
            "query":{
               "organisation":{
                  "$in":[
                     "MFgsJF5SmL6hxaYcE",
                     "MFgsJF5SmL6hxaYcE"
                  ]
               },
               "date":{
                  "$gte":ISODate("2019-06-10T15:38:01.336Z")
               }
            },
            "fields":{
               "date":1,
               "ownerId":1,
               "_id":1
            },
            "queryPlanner":{
               "plannerVersion":1,
               "namespace":"bustle-supply-uat.GPSLocations",
               "indexFilterSet":false,
               "parsedQuery":{
                  "$and":[
                     {
                        "organisation":{
                           "$eq":"MFgsJF5SmL6hxaYcE"
                        }
                     },
                     {
                        "date":{
                           "$gte":ISODate("2019-06-10T15:38:01.336Z")
                        }
                     }
                  ]
               },
               "winningPlan":{
                  "stage":"FETCH",
                  "filter":{
                     "date":{
                        "$gte": ISODate("2019-06-10T15:38:01.336Z")
                     }
                  },
                  "inputStage":{
                     "stage":"IXSCAN",
                     "keyPattern":{
                        "organisation":1
                     },
                     "indexName":"organisation_1",
                     "isMultiKey":false,
                     "multiKeyPaths":{
                        "organisation":[ ]
                     },
                     "isUnique":false,
                     "isSparse":false,
                     "isPartial":false,
                     "indexVersion":2,
                     "direction":"forward",
                     "indexBounds":{
                        "organisation":[
                           "[\"MFgsJF5SmL6hxaYcE\", \"MFgsJF5SmL6hxaYcE\"]"
                        ]
                     }
                  }
               },
               "rejectedPlans":[
                  {
                     "stage":"FETCH",
                     "filter":{
                        "organisation":{
                           "$eq":"MFgsJF5SmL6hxaYcE"
                        }
                     },
                     "inputStage":{
                        "stage":"IXSCAN",
                        "keyPattern":{
                           "date":1
                        },
                        "indexName":"date_1",
                        "isMultiKey":false,
                        "multiKeyPaths":{
                           "date":[                           ]
                        },
                        "isUnique":false,
                        "isSparse":false,
                        "isPartial":false,
                        "indexVersion":2,
                        "direction":"forward",
                        "indexBounds":{
                           "date":[
                              "[new Date(1560181081336), new Date(9223372036854775807)]"
                           ]
                        }
                     }
                  }
               ]
            }
         }
      },
      {
         "$sort":{
            "sortKey":{
               "ownerId":1,
               "date":1
            }
         }
      },
      {
         "$group":{
            "_id":"$ownerId",
            "last":{
               "$last":"$_id"
            }
         }
      }
   ],
   "ok":1
}

And the post-index query plan:

Post-index query plan

{
   "stages":[
      {
         "$cursor":{
            "query":{
               "organisation":{
                  "$in":[
                     "MFgsJF5SmL6hxaYcE",
                     "MFgsJF5SmL6hxaYcE"
                  ]
               },
               "date":{
                  "$gte": ISODate("2019-06-10T15:38:01.336Z")
               }
            },
            "sort":{
               "ownerId":1,
               "date":1
            },
            "fields":{
               "ownerId":1,
               "_id":1
            },
            "queryPlanner":{
               "plannerVersion":1,
               "namespace":"bustle-supply-uat.GPSLocations",
               "indexFilterSet":false,
               "parsedQuery":{
                  "$and":[
                     {
                        "organisation":{
                           "$eq":"MFgsJF5SmL6hxaYcE"
                        }
                     },
                     {
                        "date":{
                           "$gte": ISODate("2019-06-10T15:38:01.336Z")
                        }
                     }
                  ]
               },
               "winningPlan":{
                  "stage":"FETCH",
                  "filter":{
                     "$and":[
                        {
                           "organisation":{
                              "$eq":"MFgsJF5SmL6hxaYcE"
                           }
                        },
                        {
                           "date":{
                              "$gte": ISODate("2019-06-10T15:38:01.336Z")
                           }
                        }
                     ]
                  },
                  "inputStage":{
                     "stage":"IXSCAN",
                     "keyPattern":{
                        "ownerId":1,
                        "date":1
                     },
                     "indexName":"gpsLocationsByOwnerAndDate",
                     "isMultiKey":false,
                     "multiKeyPaths":{
                        "ownerId":[                        ],
                        "date":[                        ]
                     },
                     "isUnique":false,
                     "isSparse":false,
                     "isPartial":false,
                     "indexVersion":2,
                     "direction":"forward",
                     "indexBounds":{
                        "ownerId":[
                           "[MinKey, MaxKey]"
                        ],
                        "date":[
                           "[MinKey, MaxKey]"
                        ]
                     }
                  }
               },
               "rejectedPlans":[ ]
            }
         }
      },
      {
         "$group":{
            "_id":"$ownerId",
            "last":{
               "$last":"$_id"
            }
         }
      }
   ],
   "ok":1
}

The "offending" operation appears to be the sort in the post-index case. I noted that it included an ownerId property, which is not previous filtered by in the query plan.

Modifying the query to remove the ownerId from the sort resolves the issue;

Modified Query

db.GPSLocations.aggregate([ { $match: { organisation: { $in: [ "MFgsJF5SmL6hxaYcE", "MFgsJF5SmL6hxaYcE" ] }, date: { $gte: new Date(1560181081336) } } }, { $sort: { date: 1 } }, { $group: { _id: "$ownerId", last: { $last: "$_id" } } } ]);

Which renders this query plan:

Modified Query Plan (with index still applied)

{
   "stages":[
      {
         "$cursor":{
            "query":{
               "organisation":{
                  "$in":[
                     "MFgsJF5SmL6hxaYcE",
                     "MFgsJF5SmL6hxaYcE"
                  ]
               },
               "date":{
                  "$gte": ISODate("2019-06-10T15:38:01.336Z")
               }
            },
            "sort":{
               "date":1
            },
            "fields":{
               "ownerId":1,
               "_id":1
            },
            "queryPlanner":{
               "plannerVersion":1,
               "namespace":"bustle-supply-uat.GPSLocations",
               "indexFilterSet":false,
               "parsedQuery":{
                  "$and":[
                     {
                        "organisation":{
                           "$eq":"MFgsJF5SmL6hxaYcE"
                        }
                     },
                     {
                        "date":{
                           "$gte": ISODate("2019-06-10T15:38:01.336Z")
                        }
                     }
                  ]
               },
               "winningPlan":{
                  "stage":"FETCH",
                  "inputStage":{
                     "stage":"IXSCAN",
                     "keyPattern":{
                        "organisation":1,
                        "date":1
                     },
                     "indexName":"organisation_1_date_1",
                     "isMultiKey":false,
                     "multiKeyPaths":{
                        "organisation":[                        ],
                        "date":[                        ]
                     },
                     "isUnique":false,
                     "isSparse":false,
                     "isPartial":false,
                     "indexVersion":2,
                     "direction":"forward",
                     "indexBounds":{
                        "organisation":[
                           "[\"MFgsJF5SmL6hxaYcE\", \"MFgsJF5SmL6hxaYcE\"]"
                        ],
                        "date":[
                           "[new Date(1560181081336), new Date(9223372036854775807)]"
                        ]
                     }
                  }
               },
               "rejectedPlans":[
                  {
                     "stage":"FETCH",
                     "filter":{
                        "organisation":{
                           "$eq":"MFgsJF5SmL6hxaYcE"
                        }
                     },
                     "inputStage":{
                        "stage":"IXSCAN",
                        "keyPattern":{
                           "date":1
                        },
                        "indexName":"date_1",
                        "isMultiKey":false,
                        "multiKeyPaths":{
                           "date":[                           ]
                        },
                        "isUnique":false,
                        "isSparse":false,
                        "isPartial":false,
                        "indexVersion":2,
                        "direction":"forward",
                        "indexBounds":{
                           "date":[
                              "[new Date(1560181081336), new Date(9223372036854775807)]"
                           ]
                        }
                     }
                  }
               ]
            }
         }
      },
      {
         "$group":{
            "_id":"$ownerId",
            "last":{
               "$last":"$_id"
            }
         }
      }
   ],
   "ok":1
}

So why questions are:

  1. Given that the "pre-index" case work well, why does the "post-index" case not? I understand that MongoDB runs candidate query plans; should it not be able to determine that the post-index query plan is inferior to the first?
  2. Alternatively, is there something that we don't understand about the MongoDB aggregation pipeline and how it comes up with query plans? Do we need to be (more) careful in applying indexes to certain query shapes?
  3. If this is something that we did, is there documentation that can tell us why MongoDB acted this way in this case?

 

Many thanks!

Ben

 

 



 Comments   
Comment by Carl Champain (Inactive) [ 09/Sep/19 ]

Hi ben.deany@gmail.com,

Thanks for submitting this report! 
The query planner prefers a plan that can use an index to provide the sort order, which can result in a slower plan.
This issue is tracked in SERVER-7568, please review it and watch it for updates.

Kind regards,

Carl

Comment by Benjamin Deany [ 08/Sep/19 ]

And I forgot to post the index that was applied:

db.GPSLocations.createIndex({'ownerId': 1, 'date': 1,}, {name: 'gpsLocationsByOwnerAndDate'});

Comment by Benjamin Deany [ 08/Sep/19 ]

Oops, I'm not sure what happened to the formatting of that first query. It should be:

db.GPSLocations.aggregate([ { $match: { organisation: { $in: [ "MFgsJF5SmL6hxaYcE", "MFgsJF5SmL6hxaYcE" ] }, date: { $gte: new Date(1560181081336) } } }, { $sort: { owner: 1, date: 1 } }, { $group: { _id: "$ownerId", last: { $last: "$_id" } } } ]);

The second (modified) query has formatted as intended.

Generated at Thu Feb 08 05:02:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.