[SERVER-6176] Having compound indexes scans less objects during a query but takes more time than a single index Created: 22/Jun/12  Updated: 11/Jul/16  Resolved: 24/Jul/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.0.2
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Justin Assignee: Aaron Staple
Resolution: Done Votes: 0
Labels: performance, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MacBook Pro 2010 running Mac OS X Snow Leopard


Participants:

 Description   

I am storing documents with 2 fields: "date" and "b". For "date" I want to check that it is greater than or less than a certain date. For "b" I want to know whether or not it exists.

I want to query on both fields simultaneously.

I created some test data as follows:

date = new ISODate();

for (var i=0; i < 5000000; i++) {
m = Math.floor(Math.random()*12);
d = Math.floor(1+Math.random()*28);
date.setMonth(m);
date.setDate(d);
db.bar.insert(

{"date":date}

);
}
for (var i=0; i < 5000000; i++) {
m = Math.floor(Math.random()*12);
d = Math.floor(1+Math.random()*28);
date.setMonth(m);
date.setDate(d);
b = (Math.random() > 0.5);
db.bar.insert(

{"date":date,"b":b}

);
}

Then I run 2 tests.

Test 1) I put an index on date only, and do a query explain.

db.bar.ensureIndex(

{"date":1}

);
db.bar.find({"date":{$lte:date},"b":{$exists:false}}).explain();

Test 2) I put a compound index on date and b, and do a query explain.

db.bar.ensureIndex(

{"date":1,"b":1}

);
db.bar.find({"date":{$lte:date},"b":{$exists:false}}).explain();

Results of the tests: Having a compound index on "date" and "b" runs a bit slower, even though it scans less documents. Here is the output of the explain()s:

Test 1)

{
"cursor" : "BtreeCursor date_1",
"nscanned" : 6128089,
"nscannedObjects" : 6128089,
"n" : 3775151,
"millis" : 17420,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" :

{ "date" : [ [ true, ISODate("2012-06-18T22:06:03.419Z") ] ] }

}

Test 2)

{
"cursor" : "BtreeCursor date_1_b_1",
"nscanned" : 3775309,
"nscannedObjects" : 3775151,
"n" : 3775151,
"millis" : 18100,
"nYields" : 0,
"nChunkSkips" : 0,
"isMultiKey" : false,
"indexOnly" : false,
"indexBounds" :

{ "date" : [ [ true, ISODate("2012-06-18T22:06:03.419Z") ] ], "b" : [ [ null, null ] ] }

}



 Comments   
Comment by Aaron Staple [ 24/Jul/12 ]

No prob.

Comment by Justin [ 24/Jul/12 ]

Using the index

{b:1,date:1}

worked much better. No further questions. Thanks!

Comment by Aaron Staple [ 23/Jul/12 ]

Hi Justin,

Do you have any further questions?

Comment by Aaron Staple [ 23/Jun/12 ]

Hi Justin,

A couple of points:

  • With the compound index you tried, the indexing code will have to identify every date less than the date in your query, and for each date skip the non null values of 'b'. There may not be significant performance gains in doing this vs using a single key index for your data distribution. You may get better performance with the index {b:1,date:1}

    .

  • I just want to check - did you remove the single key index before you ran test #2? If not that could affect the outcome. Also, the tests would probably be a bit more representative if you used hint() to hint the desired index.
Generated at Thu Feb 08 03:10:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.