Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 2.4.9
Component/s: Index Maintenance
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I created a sample collection with the following code snippet

filler = '';
for(c=0;c < 100; c++)
        filler += 'a';
 
for( i =0; i < 51179; i++)
    db.test.insert( { "date" : new ISODate(), "fill" : filler} );
 
db.test.ensureIndex( { "date" : 1 })

That created the below collection which fits entirely in memory on my machine.

{
    "ns" : "test.test",
    "count" : 51179,
    "size" : 8188656,
    "avgObjSize" : 160.00031262822642,
    "storageSize" : 16773120,
    "numExtents" : 6,
    "nindexes" : 2,
    "lastExtentSize" : 12582912,
    "paddingFactor" : 1,
    "systemFlags" : 1,
    "userFlags" : 0,
    "totalIndexSize" : 2959712,
    "indexSizes" : {
        "_id_" : 1667904,
        "date_1" : 1291808
    },
    "ok" : 1
}

Now I try a full tableScan takes 13ms

db.test.find().explain()
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 13,
	"indexBounds" : {

	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

I perform an indexed direct hit, and it takes 0ms as expected

db.test.find( { date : ISODate("2014-02-04T16:01:10.544Z") }).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 1,
	"nscannedObjects" : 1,
	"nscanned" : 1,
	"nscannedObjectsAllPlans" : 1,
	"nscannedAllPlans" : 1,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 0,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("2014-02-04T16:01:10.544Z")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

I perform a ranged index query starting from the smallest date and performance falls off a cliff when compared with a tablescan

db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z") }}).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 65,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

Tablescanning on a range with no indexes is faster than indexing on a range but twice as slow as just the regular tablescan with no filters.

db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}).explain()
{
	"cursor" : "BasicCursor",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 51179,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 51179,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : false,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 28,
	"indexBounds" : {

	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

And finally, covered index queries are... slower?

db.test.find( { date : { $gte : ISODate("2014-02-04T16:01:10.544Z"), $lte : ISODate("2014-02-04T16:01:12.368Z")  }}, { date : 1, _id : 0 }).explain()
{
	"cursor" : "BtreeCursor date_1",
	"isMultiKey" : false,
	"n" : 51179,
	"nscannedObjects" : 0,
	"nscanned" : 51179,
	"nscannedObjectsAllPlans" : 0,
	"nscannedAllPlans" : 51179,
	"scanAndOrder" : false,
	"indexOnly" : true,
	"nYields" : 0,
	"nChunkSkips" : 0,
	"millis" : 89,
	"indexBounds" : {
		"date" : [
			[
				ISODate("2014-02-04T16:01:10.544Z"),
				ISODate("2014-02-04T16:01:12.368Z")
			]
		]
	},
	"server" : "Oz-Olivo-MacBook-Pro.local:27017"
}

There seems to be some overhead when traversing ranges, which is to be expected. But to turn a 13ms tablescan into a 28ms tablescan seems like a high price for this.

Moreover, I would expect an additional cost to traversing an index first and then fetching documents, so it is expected that a pure tablescan would be faster, but a 28ms tablescan on a range becoming a 65ms indexed query means the overhead of using an index is more than double. Is this really the best we can get?

And finally, the covered index is the worst of all. This is the most surprising result. How is traversing the index without fetching any documents more expensive than when we do fetch them? 89ms for a covered index vs 28ms for a tablescan. Is this to be expected?

duplicates

SERVER-13065 Consider a collection scan even if indexed plans are available

Backlog

SERVER-13842 Optimize non-simple projections

Backlog

is related to

SERVER-14661 Overhead of range scans

Closed

Assignee:: Unassigned
Reporter:: Osmar Olivo (Inactive)
Participants:: Osmar Olivo, Ramon Fernandez Marina
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Feb 24 2014 09:42:08 PM UTC
Updated:: Dec 10 2014 11:05:16 PM UTC
Resolved:: Jul 23 2014 03:45:31 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates