[SERVER-1078] $elemMatch index scanning large numbers of rows Created: 03/May/10  Updated: 07/Mar/14  Resolved: 07/Mar/13

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 1.4.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nathaniel Catchpole Assignee: Aaron Staple
Resolution: Incomplete Votes: 4
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I have an index on multiple columns - the last of which is a sort. The penultimate column in the index has very low cardinality - say 3,000 out of 5m records, is unlikely to go above 5,000 or so. When running an explain, the index is used, but mongo is doing a scan on 240k records to find out that only 343 of them have @fields.current.value : 1, despite that being in the index. If I remove that column from the index, I get exactly the same results as if it's there. However removing the column from the query it runs about three times faster. To me it seems like that part of the index ought to be used.

Explain:

> db.fields_current.node.find({ 'status': 1, '@fields': { '$elemMatch':

{ 'ex_node_group.value': 1, 'edition.tid': 369, 'channel.tid': 1, 'current.value': 1 }

} }).sort(

{ '@fields.pageviews.value': -1 }

).hint('content_list_homepage_popular_index').explain();
{
"cursor" : "BtreeCursor content_list_homepage_popular_index",
"indexBounds" : [
[
{
"@fields.edition.tid" : 369,
"@fields.ex_node_group.value" : 1,
"status" : 1,
"@fields.current.value" : 1,
"@fields.pageviews.value" :

{ "$maxElement" : 1 }

},
{
"@fields.edition.tid" : 369,
"@fields.ex_node_group.value" : 1,
"status" : 1,
"@fields.current.value" : 1,
"@fields.pageviews.value" :

{ "$minElement" : 1 }

}
]
],
"nscanned" : 240011,
"nscannedObjects" : 240011,
"n" : 343,
"millis" : 1474
}
> db.fields_current.node.find({ 'status': 1, '@fields': { '$elemMatch':

{ 'ex_node_group.value': 1, 'edition.tid': 369, 'channel.tid': 1, 'current.value': 1 }

} }).count();
343
> db.fields_current.node.find({ 'status': 1, '@fields': { '$elemMatch':

{ 'ex_node_group.value': 1, 'edition.tid': 369, 'channel.tid': 1}

} }).count();
64545
>



 Comments   
Comment by Aaron Staple [ 07/Mar/13 ]

It is hard to diagnose an example from such an old version of the software, and without more information on the documents stored in the test collection.

We have recently added some performance improvements for $elemMatch
SERVER-4180
SERVER-3104

More work optimizing $elemMatch remains though, for example
SERVER-7509
SERVER-6050

Generated at Thu Feb 08 02:55:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.