[SERVER-17266] count command performance optimization should support more complex queries Created: 12/Feb/15  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Querying
Affects Version/s: 2.6.5, 3.0.0-rc8
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrew Ryder (Inactive) Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 6
Labels: graphquery
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-27822 Count plans sometimes don't need a FETCH Closed
is duplicated by SERVER-26831 count() examines docs on covered quer... Closed
is duplicated by SERVER-22574 Allow COUNT stage to sum COUNT_SCANS ... Closed
is duplicated by SERVER-44032 Mongodb Count is slow Closed
Related
is related to SERVER-17148 Remove plans do not need a FETCH stage Backlog
Assigned Teams:
Query Optimization
Backwards Compatibility: Fully Compatible
Participants:

 Description   

With a valid index to cover the queries, the following in the shell:

db.coll.count({foo:{$in:[1,2]}});

Is consistently upwards of an order of magnitude slower than:

db.coll.count({foo:1}) + db.coll.count({foo:2})

In my test, where 700k documents match each predicate, the first variant takes ~1500ms to run, the second variant (without $in) takes ~120ms.

To reproduce requires no special documents, surrounding indexes don't matter, the index in use can be compound (include other fields), other documents present in the collection don't matter (overall cardinality is irrelevant, though the effect requires low cardinality for the items in test purely to measure it temporally).

The effect described above does not apply to a regular query, it applies only to the count command. Thus, "explain" does not reveal the reason why in 2.6.5, however, it is possible to observe a difference in the plans by running count a newfangled 3.0 explainable collection.



 Comments   
Comment by 如意 林 [ 25/Feb/21 ]

Hello, I am also using mongodb. Encountered a similar problem, why does my table count of one million data take 3 seconds.

Comment by Asya Kamsky [ 21/Oct/19 ]

Reopening this ticket specifically to track the optimization for $in when the index is NOT multikey.

Comment by Asya Kamsky [ 18/Sep/17 ]

All the examples in this ticket are fixed, probably via work in SERVER-27822.

Closing as dup, please re-open if I missed any cases.

Comment by Andrew Ryder (Inactive) [ 18/Feb/15 ]

To apply the optimization to $in would need to check the isMultiKey flag on the indexstats to ensure there is no possibility of duplicate matches.

Generated at Thu Feb 08 03:43:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.