Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7916

Multikey index performance regression

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.2.2
    • Component/s: Index Maintenance
    • Environment:
      Ubuntu 11.04
    • ALL

      After upgrading to 2.2.2 (and php driver 1.3.1) we've started noticing some odd timeouts. We use a multikey index on a set of "digested" words as a pseudo-fulltext search, and then query with $all : [ ... ], which has worked very well.

      After the upgrade, the query optimizer doesn't seem to want to use those indexes in many cases when combined with sort, which results in awful nscanned:

      data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).sort( { last_seen: -1 } ).limit(500).explain();
      {
              "cursor" : "BtreeCursor last_seen_-1",
              "isMultiKey" : false,
              "n" : 382,
              "nscannedObjects" : 596759,
              "nscanned" : 596759,
              "nscannedObjectsAllPlans" : 716532,
              "nscannedAllPlans" : 716532,
              "scanAndOrder" : false,
              "indexOnly" : false, 
              "nYields" : 131,
              "nChunkSkips" : 0,
              "millis" : 3710,
              "indexBounds" : {
                      "last_seen" : [
                              [
                                      {
                                              "$maxElement" : 1
                                      },
                                      {
                                              "$minElement" : 1
                                      }
                              ] 
                      ] 
              }, 
              "server" : "xx:27017" 
      }
      

      If we provide a hint, performance is better, but still not great because only the first term inside $all appears to be used for the index. I'm not sure if the previous behavior would have been to use both (i.e. find both sets then compute intersection), but performance used to be much better.

      data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).hint({ ftx : 1 }).sort( { last_seen: -1 } ).limit(500).explain();
      {
              "cursor" : "BtreeCursor ftx_1",
              "isMultiKey" : true,
              "n" : 382,
              "nscannedObjects" : 131956,
              "nscanned" : 131956,
              "nscannedObjectsAllPlans" : 131956,
              "nscannedAllPlans" : 131956,
              "scanAndOrder" : true,
              "indexOnly" : false,
              "nYields" : 174,
              "nChunkSkips" : 0,
              "millis" : 5901,
              "indexBounds" : {
                      "ftx" : [
                              [
                                      "the",
                                      "the"
                              ]
                      ]
              },
              "server" : "xx:27017"
      }
      

      If we manually reorder the terms inside $all, the result is pretty good, similar to past performance:

      data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "xx", "the" ] } } ).hint({ ftx : 1 }).sort( { last_seen: -1 } ).limit(500).explain();
      {
              "cursor" : "BtreeCursor ftx_1",
              "isMultiKey" : true,
              "n" : 382,
              "nscannedObjects" : 511,
              "nscanned" : 511,
              "nscannedObjectsAllPlans" : 511,
              "nscannedAllPlans" : 511,
              "scanAndOrder" : true,
              "indexOnly" : false,
              "nYields" : 3,
              "nChunkSkips" : 0,
              "millis" : 20,
              "indexBounds" : {
                      "ftx" : [
                              [ 
                                      "xx",
                                      "xx"
                              ]
                      ]
              },
              "server" : "xx:27017"
      }
      
      data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "the", "xx" ] } } ).hint({ ftx : 1 }).limit(500).explain();
      {
              "cursor" : "BtreeCursor ftx_1",
              "isMultiKey" : true,
              "n" : 382,
      

      Finally, the behavior of the optimizer seems unpredictable, for example some (absolutely catastrophic) slow queries logged like this:

      Wed Dec 12 14:30:21 [conn50070] query ftx.a_ftx query: { $query: { ftx: { $all: [ "hamilton", "jo" ] } }, $orderby: { last_seen: -1 } } ntoreturn:500 ntoskip:0 nscanned:596752 keyUpdates:0 numYields: 724 locks(micros) r:39541636 nreturned:3 reslen:143 20917ms
      

      behave reasonably if reproduced exactly via the mongo shell:

      data:PRIMARY> db.a_ftx.find( { ftx: { $all: [ "hamilton", "jo" ] } } ).sort( { last_seen: -1 } ).limit(500).explain(); 
      { 
              "cursor" : "BtreeCursor ftx_1", 
              "isMultiKey" : true, 
              "n" : 3, 
              "nscannedObjects" : 148, 
              "nscanned" : 148, 
              "nscannedObjectsAllPlans" : 444, 
              "nscannedAllPlans" : 444, 
              "scanAndOrder" : true, 
              "indexOnly" : false, 
              "nYields" : 0, 
              "nChunkSkips" : 0, 
              "millis" : 1, 
              "indexBounds" : { 
                      "ftx" : [ 
                              [ 
                                      "hamilton", 
                                      "hamilton" 
                              ] 
                      ] 
              }, 
              "server" : "xx:27017" 
      } 
      

            Assignee:
            aaron Aaron Staple
            Reporter:
            parkan Arkadiy Kukarkin
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: