Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-12871

Seemingly unreasonable overhead to range scans when using indexes

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.4.9
    • Fix Version/s: None
    • Component/s: Indexing
    • Labels:
      None

      Description

      I created a sample collection with the following code snippet

      filler = '';
      for(c=0;c < 100; c++)
              filler += 'a';
       
      for( i =0; i < 51179; i++)
          db.test.insert( { "date" : new ISODate(), "fill" : filler} );
       
      db.test.ensureIndex( { "date" : 1 })

      That created the below collection which fits entirely in memory on my machine.

      {
          "ns" : "test.test",
          "count" : 51179,
          "size" : 8188656,
          "avgObjSize" : 160.00031262822642,
          "storageSize" : 16773120,
          "numExtents" : 6,
          "nindexes" : 2,
          "lastExtentSize" : 12582912,
          "paddingFactor" : 1,
          "systemFlags" : 1,
          "userFlags" : 0,
          "totalIndexSize" : 2959712,
          "indexSizes" : {
              "_id_" : 1667904,
              "date_1" : 1291808
          },
          "ok" : 1
      }

      Now I try a full tableScan takes 13ms

      db.test.find().explain()
      {
      	"cursor" : "BasicCursor",
      	"isMultiKey" : false,
      	"n" : 51179,
      	"nscannedObjects" : 51179,
      	"nscanned" : 51179,
      	"nscannedObjectsAllPlans" : 51179,
      	"nscannedAllPlans" : 51179,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 13,
      	"indexBounds" : {
       
      	},
      	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
      }

      I perform an indexed direct hit, and it takes 0ms as expected

      db.test.find( { date : ISODate("2014-02-04T16:01:10.544Z") }).explain()
      {
      	"cursor" : "BtreeCursor date_1",
      	"isMultiKey" : false,
      	"n" : 1,
      	"nscannedObjects" : 1,
      	"nscanned" : 1,
      	"nscannedObjectsAllPlans" : 1,
      	"nscannedAllPlans" : 1,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 0,
      	"indexBounds" : {
      		"date" : [
      			[
      				ISODate("2014-02-04T16:01:10.544Z"),
      				ISODate("2014-02-04T16:01:10.544Z")
      			]
      		]
      	},
      	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
      }

      I perform a ranged index query starting from the smallest date and performance falls off a cliff when compared with a tablescan

       
      db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z") }}).explain()
      {
      	"cursor" : "BtreeCursor date_1",
      	"isMultiKey" : false,
      	"n" : 51179,
      	"nscannedObjects" : 51179,
      	"nscanned" : 51179,
      	"nscannedObjectsAllPlans" : 51179,
      	"nscannedAllPlans" : 51179,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 65,
      	"indexBounds" : {
      		"date" : [
      			[
      				ISODate("2014-02-04T16:01:10.544Z"),
      				ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
      			]
      		]
      	},
      	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
      }

      Tablescanning on a range with no indexes is faster than indexing on a range but twice as slow as just the regular tablescan with no filters.

      db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}).explain()
      {
      	"cursor" : "BasicCursor",
      	"isMultiKey" : false,
      	"n" : 51179,
      	"nscannedObjects" : 51179,
      	"nscanned" : 51179,
      	"nscannedObjectsAllPlans" : 51179,
      	"nscannedAllPlans" : 51179,
      	"scanAndOrder" : false,
      	"indexOnly" : false,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 28,
      	"indexBounds" : {
       
      	},
      	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
      }

      And finally, covered index queries are... slower?

      db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}, { date : 1, _id : 0 }).explain()
      {
      	"cursor" : "BtreeCursor date_1",
      	"isMultiKey" : false,
      	"n" : 51179,
      	"nscannedObjects" : 0,
      	"nscanned" : 51179,
      	"nscannedObjectsAllPlans" : 0,
      	"nscannedAllPlans" : 51179,
      	"scanAndOrder" : false,
      	"indexOnly" : true,
      	"nYields" : 0,
      	"nChunkSkips" : 0,
      	"millis" : 89,
      	"indexBounds" : {
      		"date" : [
      			[
      				ISODate("2014-02-04T16:01:10.544Z"),
      				ISODate("2014-02-04T16:01:12.368Z")
      			]
      		]
      	},
      	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
      }

      There seems to be some overhead when traversing ranges, which is to be expected. But to turn a 13ms tablescan into a 28ms tablescan seems like a high price for this.

      Moreover, I would expect an additional cost to traversing an index first and then fetching documents, so it is expected that a pure tablescan would be faster, but a 28ms tablescan on a range becoming a 65ms indexed query means the overhead of using an index is more than double. Is this really the best we can get?

      And finally, the covered index is the worst of all. This is the most surprising result. How is traversing the index without fetching any documents more expensive than when we do fetch them? 89ms for a covered index vs 28ms for a tablescan. Is this to be expected?

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                7 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: