[SERVER-22574] Allow COUNT stage to sum COUNT_SCANS for queries with $in operator Created: 11/Feb/16  Updated: 06/Dec/22  Resolved: 17/Feb/16

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Vitaly Puzrin Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-17266 count command performance optimizatio... Backlog
Assigned Teams:
Query
Participants:

 Description   

Count over covered index happens too slow if $in condition exists. Example:

db.forum.topics.count({ "section": ObjectId("56b25ca88d4b465a5dda716e"), "st": { "$in": [ 1, 2] } })

takes ~50ms with 20000 keys examined, while separate queries takes 1 & 0 ms:

db.forum.topics.count({ "section": ObjectId("56b25ca88d4b465a5dda716e"), "st": 1 })

db.forum.topics.count({ "section": ObjectId("56b25ca88d4b465a5dda716e"), "st": 2 })

Not we split such $in queries to mutiple, but:

1. That's not convenient
2. That's impossible for "pagination" requests, when you have $in and skip+limit.



 Comments   
Comment by Kelsey Schubert [ 17/Feb/16 ]

Hi vitaly,

After review, the work described in this ticket is already encapsulated in SERVER-17266, so I will be closing this a duplicate. Please free to vote for it and watch it for updates.

Regarding your second note, I would recommend posting to the slow query to mongodb-users group with more details about your schema, indexes, and use case. The user group provides MongoDB-related support and may suggest modifications to improve the performance you are seeing.

Kind regards,
Thomas

Comment by Vitaly Puzrin [ 12/Feb/16 ]

Tomas, i have 2 notes to your comment:

Note 1

db.foo.count( { x : { $gt: 500 }, y: { $in : [1] } })

Is a bit faster than with $in [1, 2], but still much slower than strict condition:

db.foo.count({ x : { $gt: 500 }, y: 1 } })

Use case: count number of visible topics in forum section before specific date/_id.

So, to make optimization of { y: $in [1,2] } useful, it's speed should be comparable with strict condition { y: 1 }, not with current single { y: $in [1] }. I mean, such $in should not be considered as interval.

Note 2

Consider another case with $in, that needs improvement - "pagination" query (also with index for IXSCAN):

db.forum.topics.find({ "section": ObjectId("56b25ca88d4b465a5dda716e"), "st": { "$in": [ 1, 2] } }, { _id: 1 }).sort({ "last_post": -1 }).skip(20000).limit(30)

It suffers with the same performance problem, but can not be splitted manually as count. Replace $in with $or does not help much. I understand that skip+limit is slow by nature, but mongo's performance is significantly slower than expected (compared with mysql).

Is it possible to generalize request to cover this case too (or create a new one)?

Comment by Kelsey Schubert [ 12/Feb/16 ]

Hi vitaly,

Thank you for opening this ticket. The behavior you are observing is expected: the separate queries are able to take advantage of optimizations in the COUNT_SCAN stage to complete more rapidly.

One of the current requirements of COUNT_SCAN is that index bounds must be a single interval. The following query is able to take advantage of the COUNT_SCAN stage which utilizes a number of optimizations over INDEX_SCAN since it has only one index bound interval.

db.foo.explain("executionStats").count( {x : {$in : [1] }} )

Whereas this query:

db.foo.explain("executionStats").count( {x : {$in : [1,2] }} )

is composed of two discrete index bounds: [1.0, 1.0] and [2.0, 2.0], and consequently the COUNT_SCAN stage is not selected.

You may want to consider whether the following query would provide the same results as the query with the $in operator.

db.foo.explain("executionStats").count({ x : { $gte :1, $lte : 2} })

Since the index bound is a single interval it is able to take advantage of the COUNT_SCAN optimizations. This workaround would be dependent on your particular use case.

I am modifying to the ticket summary to highlight the specific improvement request and sending it to be scheduled during the next round of planning. Please continue to watch this ticket for updates.

Thank you,
Thomas

Generated at Thu Feb 08 04:00:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.