Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11116

cursor limit has inconsistent effect on compound index scan count

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.5.5
    • Affects Version/s: 2.4.5, 2.5.3
    • Component/s: Querying
    • Labels:
    • ALL

      (Filed on behalf of, originally discovered by sasher@nds.com.)

      Revised description (without explain)

      When we query a collection with a compound index:

      {a: 1, b: 1}
      

      and provide a limit on the result set, the number of index keys scanned in the compound index seems to vary widely depending on the range of values requested for the first key.

      (assuming that limit is 200, index contains 2000 documents in range [1000, 3000) for every a)
      find({a: {$in: [1], $b}: {$gte:1000, $lt: 3000}}) -> documents scanned will be equal to limit value + 1 (201)
      find({a: {$in: [1, 2], $b}: {$gte:1000, $lt: 3000}}) -> documents scanned will be all documents in index matching filter + some constant (4002)
      

      The inconsistency is in the way the limit count affects the first query but not the second where we are requesting 2 values for the first key instead of 1.
      --------------

      Collection contains large number of documents with fields a and b (in addition to _id). There is an index on

      {a:1, b:1}
      

      A query with $in expression on a and sort of b with limit should use the index on a and b. It does use the index (and optimization from SERVER-5063) when explain() is appended, however without explain even with hint it scans and sorts the whole collection.

      Here is output from the logs for query:

        db.logs2.find({"a" : {$in : [10000, 2000]}, "b" : { $gt : 10000, $lt : 30000}}).sort({b:1}).limit(100).hint({a:1,b:1}) 

      with and without explain.

      [conn244] query index.logs2 query: { query: { a: { $in: [ 10000.0, 2000.0 ] }, b: { $gt: 10000.0, $lt: 30000.0 } }, orderby: { b: 1.0 }, $hint: { a: 1.0, b: 1.0 }, $explain: true } ntoreturn:100 ntoskip:0 nscanned:101 scanAndOrder:1 keyUpdates:0 locks(micros) r:1314 nreturned:1 reslen:634 1ms
      [conn244] query index.logs2 query: { query: { a: { $in: [ 10000.0, 2000.0 ] }, b: { $gt: 10000.0, $lt: 30000.0 } }, orderby: { b: 1.0 }, $hint: { a: 1.0, b: 1.0 } } ntoreturn:100 ntoskip:0 nscanned:15000 scanAndOrder:1 keyUpdates:0 locks(micros) r:91487 nreturned:100 reslen:4620 91ms
      

      The only way to keep it from sorting the entire dataset without explain() is by using negative limit.

            Assignee:
            hari.khalsa@10gen.com hari.khalsa@10gen.com
            Reporter:
            sasher@nds.com Shay Asher
            Votes:
            7 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: