[SERVER-3435] Count is even slower than find(...)->sort(...)->limit(...) Created: 17/Jul/11  Updated: 29/Aug/11  Resolved: 18/Jul/11

Status: Closed
Project: Core Server
Component/s: Performance, Querying
Affects Version/s: 1.8.2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tim van Elsloo Assignee: Unassigned
Resolution: Done Votes: 0
Labels: performance, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OS X 10.6.8 2.16 GHz
Intel Core 2 Duo
2 GB 667 MHz DDR2 SDRAM


Operating System: ALL
Participants:

 Description   

I'm using a dummy collection of 18M rows in the following format:

{ _id: __unique_of_course__, name: __also_unique__, dummyID: __not_unique__ }

There's an additional index on dummyID:
db.dummy.ensureIndex(

{dummyID: 1}

);

Now, when I query it ...
db.dummy.count(

{dummyID: __not_unique__}

);

This will take several seconds ....

Now, when I query the following ...
db.dummy.find(

{dummyID: __not_unique__}

).sort(

{dummyID: -1}

).limit(10);

This takes just one ms or so ...

Is this some bug, or did I just forgot something?

It'ld make sense if sorting a 18M result-set took more time than counting it ...

(note: it may be faster when testing it on a real server, I'm testing it on my MacBook right now. Look at the relative difference between the benchmarks).



 Comments   
Comment by Eliot Horowitz (Inactive) [ 18/Jul/11 ]

For count you are correct.

For sorting, you are only correct if there isn't an index.

If there is in an index, it can scan the collection in the correct order, so only has to look at 10 records, not N

Comment by Tim van Elsloo [ 18/Jul/11 ]

@Eliot, thank you for your response !

Correct me if I'm wrong, but this is what `count` does, right (in basic, it's probably much harder):

1. Loop trough every item.
2. Check if it matches.
3. If so, increment the `count`-variable.
4. When loop is dead, return the `count`-variable.

So, indeed, it does need to scan all the matches.

Now, (what I think) what's going on when `sort`ing.

1. Create a first temporary item-list.
2. Loop trough every item.
3. Check if it matches.
4. If so, add it to the first temporary item-list.
5. When loop is dead, start sorting.
6. For every item, `strcmp` it to the previous and next item to determine it's position in the final item-list.
7. Return the final item-list.

If I'm correct, wouldn't it make more sense that `sort`-ing took more time than `count`-ing?

As a temporary solution I've created a new collection, which contains the counters so I don't need to use the `count`-command anymore (just findOne(

{someKey: someValue}

).count).

Comment by Eliot Horowitz (Inactive) [ 18/Jul/11 ]

The query itself seems perfectly indexed, so should be very fast to return the top 10.
The count needs to scan all the matches, so could be much slower.
Note SERVER-1752 to improve the performance of count.

Comment by Tim van Elsloo [ 17/Jul/11 ]

Oh, I forgot to say that I'm using `mongos`, I haven't tested it using `mongod` yet. However, I don't think that would make a big difference, since the difference between those benchmarks is larger than just 2 queries in row. As I said, it's a dummy-environment, so I'm running 2 mongod-procs, 1 mongod-config-proc and 1 mongos-proc add the same address (localhost).

Generated at Thu Feb 08 03:03:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.