[SERVER-12871] Seemingly unreasonable overhead to range scans when using indexes Created: 24/Feb/14  Updated: 10/Dec/14  Resolved: 23/Jul/14

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.4.9
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Osmar Olivo Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13065 Consider a collection scan even if in... Backlog
duplicates SERVER-13842 Optimize non-simple projections Backlog
Related
is related to SERVER-14661 Overhead of range scans Closed
Participants:

 Description   

I created a sample collection with the following code snippet

filler = '';
for(c=0;c < 100; c++)
        filler += 'a';
 
for( i =0; i < 51179; i++)
    db.test.insert( { "date" : new ISODate(), "fill" : filler} );
 
db.test.ensureIndex( { "date" : 1 })

That created the below collection which fits entirely in memory on my machine.

{
    "ns" : "test.test",
    "count" : 51179,
    "size" : 8188656,
    "avgObjSize" : 160.00031262822642,
    "storageSize" : 16773120,
    "numExtents" : 6,
    "nindexes" : 2,
    "lastExtentSize" : 12582912,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 2959712,
    "indexSizes" : {
        "_id_" : 1667904,
        "date_1" : 1291808
    },
    "ok" : 1
}

Now I try a full tableScan takes 13ms

db.test.find().explain()
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 13,
	"indexBounds" : {
 
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

I perform an indexed direct hit, and it takes 0ms as expected

db.test.find( { date : ISODate("2014-02-04T16:01:10.544Z") }).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 1,
	"nscannedAllPlans" : 1,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("2014-02-04T16:01:10.544Z")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

I perform a ranged index query starting from the smallest date and performance falls off a cliff when compared with a tablescan

 
db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z") }}).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 65,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

Tablescanning on a range with no indexes is faster than indexing on a range but twice as slow as just the regular tablescan with no filters.

db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}).explain()
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 28,
	"indexBounds" : {
 
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

And finally, covered index queries are... slower?

db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}, { date : 1, _id : 0 }).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 0,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : true,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 89,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("2014-02-04T16:01:12.368Z")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

There seems to be some overhead when traversing ranges, which is to be expected. But to turn a 13ms tablescan into a 28ms tablescan seems like a high price for this.

Moreover, I would expect an additional cost to traversing an index first and then fetching documents, so it is expected that a pure tablescan would be faster, but a 28ms tablescan on a range becoming a 65ms indexed query means the overhead of using an index is more than double. Is this really the best we can get?

And finally, the covered index is the worst of all. This is the most surprising result. How is traversing the index without fetching any documents more expensive than when we do fetch them? 89ms for a covered index vs 28ms for a tablescan. Is this to be expected?



 Comments   
Comment by Ramon Fernandez Marina [ 23/Jul/14 ]

This ticket is now replaced by two separate tickets:

Generated at Thu Feb 08 03:29:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.