[SERVER-9640] Distinct command does not use covering multi-key indexes. Created: 09/May/13  Updated: 21/Jul/15  Resolved: 29/Jan/15

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.4.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Will Shaver Assignee: Unassigned
Resolution: Duplicate Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-13298 distinct doesn't use index to distinc... Closed
Operating System: ALL
Participants:

 Description   

> use banana
switched to db banana
> db.dropDatabase();
{ "dropped" : "banana", "ok" : 1 }
> use banana
switched to db banana
> db.c.ensureIndex({'name' : 1} )
> db.c.ensureIndex({'addresses.state' : 1} )
> db.c.insert({name: 'abcd', addresses: [{state:'OR'}, {state:'WA'}] });
> db.c.insert({name: 'qwer', addresses: [{state:'AZ'}, {state:'CA'}] });
> db.runCommand({ distinct: 'c', key: 'name'} )
{
        "values" : [
                "abcd",
                "qwer"
        ],
        "stats" : {
                "n" : 2,
                "nscanned" : 2,
                "nscannedObjects" : 0,
                "timems" : 0,
                "cursor" : "BtreeCursor name_1"
        },
        "ok" : 1
}

As expected with a non-nested distinct query the nscannedObjects is 0.

> db.runCommand({ distinct: 'c', key: 'addresses.state'} )
{
        "values" : [
                "OR",
                "WA",
                "AZ",
                "CA"
        ],
        "stats" : {
                "n" : 2,
                "nscanned" : 2,
                "nscannedObjects" : 2,
                "timems" : 0,
                "cursor" : "BasicCursor"
        },
        "ok" : 1
}

As you can see, the nscannedObjects is 2 for the second query, where it would be expected to be 1. Doing a find on this data uses the index as expected:

> db.c.find({'addresses.state':'CA'}).explain()
{
        "cursor" : "BtreeCursor addresses.state_1",
        "isMultiKey" : true,
        "n" : 1,
        "nscannedObjects" : 1,
        "nscanned" : 1,
        "nscannedObjectsAllPlans" : 1,
        "nscannedAllPlans" : 1,
        "scanAndOrder" : false,
        "indexOnly" : false,
        "nYields" : 0,
        "nChunkSkips" : 0,
        "millis" : 0,
        "indexBounds" : {
                "addresses.state" : [
                        [
                                "CA",
                                "CA"
                        ]
                ]
        },
        "server" : "dellm4500ws:27017"
}



 Comments   
Comment by David Storch [ 29/Jan/15 ]

Closing as a duplicate of SERVER-13298. Although this ticket is older, the newer one has additional context and has been triaged more recently.

Comment by Matthieu Rigal [ 12/Aug/14 ]

And even better, it is consistently taking the worst index ! Even with 2.6.4 :

> db.runCommand({distinct: "journals", key: "_cls", query: {cust_id: 5001, _cls: {$gt: 'a'}}})
{
"values" : [
"j.ap",
....
"j.rn"
],
"stats" :

{ "n" : 1179670, "nscanned" : 1179670, "nscannedObjects" : 1179670, "timems" : 46173, "cursor" : "BtreeCursor cust_id_1_shift_uuid_1__cls_1_synced_at_1" }

,
"ok" : 1
}

> db.runCommand({distinct: "journals", key: "shift_uuid", query: {cust_id: 5001, shift_uuid: {$exists: true}}})
{
"values" : [
BinData(3,"nM+Eo2fHT/Ou1e/lzlI9jA=="),
.....
BinData(3,"8kchs9qaTDujtYjx+gb5IA==")
],
"stats" :

{ "n" : 1166689, "nscanned" : 1179670, "nscannedObjects" : 1179670, "timems" : 139735, "cursor" : "BtreeCursor cust_id_1__cls_1_metadata.created.iso_utc_1" }

,
"ok" : 1
}

Comment by Matthieu Rigal [ 11/Aug/14 ]

Here another failed example with 2.6.3:

Indexes available : [ ...,
"cust_id_1_shift_uuid_1__cls_1_synced_at_1" : 1968134896,
"cust_id_1__cls_1_metadata.created.iso_utc_1" : 1359194592]

> db.runCommand({distinct: "journals", key:"shift_uuid", query: {"cust_id": 5001}})
{
"values" : [...],
"stats" :

{ "n" : 1179670, "nscanned" : 1179670, "nscannedObjects" : 1179670, "timems" : 71026, "cursor" : "BtreeCursor cust_id_1__cls_1_metadata.created.iso_utc_1" }

,
"ok" : 1
}

Why doesn't the hint use the "cust_id_1_shift_uuid_1__cls_1_synced_at_1" index ???

Comment by Will Shaver [ 13/May/13 ]

Also fails to use the index for arrays of strings, such as the classic blog post/tags schema:

{ name: 'Blog Post', tags: ['Music', 'Arts'] }

Generated at Thu Feb 08 03:21:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.