[SERVER-44032] Mongodb Count is slow Created: 15/Oct/19  Updated: 06/Dec/19  Resolved: 06/Dec/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.2.0
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: virinchi samineni Assignee: Eric Sedor
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File scr1.png     PNG File scrnsht.png    
Issue Links:
Duplicate
duplicates SERVER-17266 count command performance optimizatio... Backlog
Participants:

 Description   

Hi,
I am trying to improve count using covered queries but its taking ~5sec to return the response.

Is this behavior expected or there is something I can do to improve this.

Please find details below-

There is collection with 6 million records

consider each document to be something like this

{ "_id": "523453252", "type": "test" }

i have a single field index on "type"

 when I execute following command
db.test.explain("executionStats").find({type:{$in:["test","test1","test2"]}},{type:1,_id:0}).count()

the number of keys examined are ~5.9 million
number of docs examined are 0
even though there are no docs examined i.e nothing is loaded from disk
and the query is using only index the count is taking ~5sec
Please let me know if there is anything that can be done to improve count response time.



 Comments   
Comment by Eric Sedor [ 06/Dec/19 ]

Hi,

We haven’t heard back from you for some time, so I’m going to close this issue as a duplicate of SERVER-17266. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Sincerely,
Eric

Comment by Eric Sedor [ 07/Nov/19 ]

Hi virinchisamineni@gmail.com,

Are you are able to provide an exact command and explain() results via the MongoDB shell? We would still need that information to diagnose this issue as something other than SERVER-17266

Gratefully,
Eric

Comment by Eric Sedor [ 25/Oct/19 ]

virinchisamineni@gmail.com,

Just to keep in mind, we are still continuing with the assumption that this is a duplicate of SERVER-17266, and our workaround suggestion continues to be to issue separate queries for each $in clause.

If you would like to continue trying to confirm this fully, we would need to see the explain plan generated in the mongodb shell. Please C&P both the command and the output. Screenshots make it very difficult for us to communicate about specific details.

Gratefully,
Eric

Comment by virinchi samineni [ 24/Oct/19 ]

hi @eric.sedor
please find screen shot below and no problem with going back and forth 
I want understand how the system works.

I don't see COUNT_STAGE now either

Comment by Eric Sedor [ 24/Oct/19 ]

Thanks virinchisamineni@gmail.com and sorry for the back and forth here. I do see .count() is included in what you ran but it really seems like the count portion of this operation is not reaching explain().

Can you try this form please, and paste the explain results?

db.test.explain("executionStats").count({applicationType:{$in:["test","test1","test2"]}})

Comment by virinchi samineni [ 24/Oct/19 ]

Hi @eric.sedor

Yes "applicationType" is equivalent to "type" provided in description. I just wanted to convey the problem i was facing. here is screen shot for command I ran.

Comment by Eric Sedor [ 23/Oct/19 ]

virinchisamineni@gmail.com: this query looks to be on applicationType instead of type as initially provided in the description, and I'm a bit surprised there is no COUNT stage at all. Can you please provide the exact command you run to produce that explain output? It will help us dispel confusion while interpreting the explain plan. Thanks!

Comment by virinchi samineni [ 23/Oct/19 ]

Hi @eric.sedor 
Please find below the output above command
{
"queryPlanner": {
"plannerVersion": 1,
"namespace": "demoapp.test",
"indexFilterSet": false,
"parsedQuery": {
"applicationType":

{ "$in": [ "test", "test1", "test2" ] }

},
"queryHash": "612B6DDB",
"planCacheKey": "75C11F3B",
"winningPlan": {
"stage": "IXSCAN",
"keyPattern":

{ "applicationType": -1 }

,
"indexName": "applicationType_-1",
"isMultiKey": false,
"multiKeyPaths":

{ "applicationType": [ ] }

,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds":

{ "applicationType": [ "[\"test2\", \"test2\"]", "[\"test1\", \"test1\"]", "[\"test\", \"test\"]" ] }

},
"rejectedPlans": [
{
"stage": "IXSCAN",
"keyPattern":

{ "applicationType": 1, "classification": 1 }

,
"indexName": "applicationType_1_classification_1",
"isMultiKey": false,
"multiKeyPaths":

{ "applicationType": [ ], "classification": [ ] }

,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds":

{ "applicationType": [ "[\"test\", \"test\"]", "[\"test1\", \"test1\"]", "[\"test2\", \"test2\"]" ], "classification": [ "[MinKey, MaxKey]" ] }

},
{
"stage": "IXSCAN",
"keyPattern":

{ "applicationType": 1, "classification": 1, "isBundleListed": 1 }

,
"indexName": "applicationType_1_classification_1_isBundleListed_1",
"isMultiKey": false,
"multiKeyPaths":

{ "applicationType": [ ], "classification": [ ], "isBundleListed": [ ] }

,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds":

{ "applicationType": [ "[\"test\", \"test\"]", "[\"test1\", \"test1\"]", "[\"test2\", \"test2\"]" ], "classification": [ "[MinKey, MaxKey]" ], "isBundleListed": [ "[MinKey, MaxKey]" ] }

},
{
"stage": "IXSCAN",
"keyPattern":

{ "applicationType": 1, "classification": 1, "publicationDate": -1, "isBundleListed": 1 }

,
"indexName": "applicationType_1_classification_1_publicationDate_-1_isBundleListed_1",
"isMultiKey": false,
"multiKeyPaths":

{ "applicationType": [ ], "classification": [ ], "publicationDate": [ ], "isBundleListed": [ ] }

,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds":

{ "applicationType": [ "[\"test\", \"test\"]", "[\"test1\", \"test1\"]", "[\"test2\", \"test2\"]" ], "classification": [ "[MinKey, MaxKey]" ], "publicationDate": [ "[MaxKey, MinKey]" ], "isBundleListed": [ "[MinKey, MaxKey]" ] }

}
]
},
"executionStats": {
"executionSuccess": true,
"nReturned": 6122209,
"executionTimeMillis": 4144,
"totalKeysExamined": 6122211,
"totalDocsExamined": 0,
"executionStages": {
"stage": "IXSCAN",
"nReturned": 6122209,
"executionTimeMillisEstimate": 102,
"works": 6122212,
"advanced": 6122209,
"needTime": 2,
"needYield": 0,
"saveState": 47833,
"restoreState": 47833,
"isEOF": 1,
"keyPattern":

{ "applicationType": -1 }

,
"indexName": "applicationType_-1",
"isMultiKey": false,
"multiKeyPaths":

{ "applicationType": [ ] }

,
"isUnique": false,
"isSparse": false,
"isPartial": false,
"indexVersion": 2,
"direction": "forward",
"indexBounds":

{ "applicationType": [ "[\"test2\", \"test2\"]", "[\"test1\", \"test1\"]", "[\"test\", \"test\"]" ] }

,
"keysExamined": 6122211,
"seeks": 3,
"dupsTested": 0,
"dupsDropped": 0
}
},
"ok": 1,
"$clusterTime": {
"clusterTime": Timestamp(1571795638,
1),
"signature":

{ "hash": BinData(0, "AAAAAAAAAAAAAAAAAAAAAAAAAAA="), "keyId": 0 }

},
"operationTime": Timestamp(1571795638,
1)
}

Comment by Eric Sedor [ 22/Oct/19 ]

Hi virinchisamineni@gmail.com,

To be clear: Yes, the delay is due to the current implementation of count.

And yes, the current workaround for this is to issue separate queries. This does impose the network cost you as you say, but depending on the size of the collection, that could still be faster than issuing one query of $ins, and less disruptive of the working set. That means other operations won't be as impacted by the count operations either, so there could be indirect benefits for other ops.

I believe Asya is correct that this is a duplicate of SERVER-17266. But for completeness, can you please provide the output of:

db.test.explain("executionStats").find({type:{$in:["test","test1","test2"]}},{type:1,_id:0}).count()

Gratefully,
Eric

Comment by Asya Kamsky [ 21/Oct/19 ]

This appears to be duplicate of SERVER-17266 (and a few related tickets). The issue is that faster COUNT_SCAN can only be used if the index is not multiKey. I've reopened the other ticket and scheduled it for triage, this one should be marked as a duplicate of it.

Comment by virinchi samineni [ 18/Oct/19 ]

Hi Eric,

Thanks for the update.

Are you suggesting that rather than using $in operator 
 I should be querying the database 3 times(one for each type )
and then combine counts for each of them.

If that's the case won't I be facing network latency as i am issuing 3 separate queries as opposed to one.

I want to understand if there is a possibility of reducing 5 sec to 1~2 sec
and also I would like to know whether 5 sec is  happening  due to how count is implemented in mongodb or any other issue.

Comment by Eric Sedor [ 17/Oct/19 ]

Hi virinchisamineni@gmail.com,

It looks like you are experiencing known behavior with how count is currently able to use indexes. For this specific operation, you should be able to issue queries for each type separately (rather than using $in. Each individual count request should be able to use only the index because each operation would access a "contiguous" range of index keys.

That said, I am looking into whether or not a jira ticket exists to change this behavior, and will update you. Thanks in advance for your patience.

Sincerely,
Eric

Comment by virinchi samineni [ 17/Oct/19 ]

Hi ,
Any update on this.

Generated at Thu Feb 08 05:04:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.