Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-3122

Range queries on compound indexes involve unnecessary scanning when sorting and limit are involved

    • Type: Icon: Improvement Improvement
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 1.8.1, 1.9.0
    • Component/s: Performance, Querying
    • Labels:
      None

      If there's a compound index, you query and sort on the last key, then MongoDB doesn't take limit into account and does a lot of unnecessary scanning.

      For example, if I have an index on {i:1, j:1}, I query with a range on i and sort on j with limit 10, then explain and nscanned shows that a whole range from [i1, jmin] to [i2, jmax] was scanned, even for for each i only the first 10 could be scanned and the rest could be skipped. Attached is an example script that generates the following output:

      MongoDB shell version: 1.9.0
      connecting to: test
      {
      	"cursor" : "BtreeCursor i_1_j_1",
      	"nscanned" : 3000,
      	"nscannedObjects" : 3000,
      	"n" : 10,
      	"scanAndOrder" : true,
      	"millis" : 6,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"isMultiKey" : false,
      	"indexOnly" : false,
      	"indexBounds" : {
      		"i" : [
      			[
      				2,
      				4
      			]
      		],
      		"j" : [
      			[
      				{
      					"$minElement" : 1
      				},
      				{
      					"$maxElement" : 1
      				}
      			]
      		]
      	}
      }
      

      If BtreeCursor could know that it is a sorting query on a field with a particular limit, then it is maybe possible to optimize this case by advancing to the next group of keys if limit number of objects have been scanned with the same "prefix" of fields that are before that field (I hope I say it correctly).

      I real life scenario I have a collection of "events" (which stores that an event of particular "type" happened, where "type" is a string) and use an index on {type:1, _id:1}. On the web-interface I need to show e.g. last 10 events of some types. When querying with a specific type, it works perfectly. But when trying to show last e.g. 10 events of several types it basically starts using BasicCursor, because it turns out to be faster.

      In some places I currently workaround it by doing several queries on each type in parallel and merging them at client side. It works when exact types are known in advance, but doesn't when they are not, e.g. when trying to query using a regex like /^prefix/ (and it's hard to know which types there are, because distinct has a similar problem with excessive scanning).

      Could you please improve this, or at least point me how exactly MongoDB constructs cursors for sorting and where sorting is happening, so I could tinker with it myself?

            Assignee:
            aaron Aaron Staple
            Reporter:
            snaury Alexey Borzenkov
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: