[SERVER-1752] improve the performance of simple counts Created: 08/Sep/10 Updated: 28/Oct/15 Resolved: 21/Dec/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance |
| Affects Version/s: | 1.7.0 |
| Fix Version/s: | 2.3.2 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | phoenix | Assignee: | Aaron Staple |
| Resolution: | Done | Votes: | 133 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Linux |
||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Participants: |
Aaron Staple, Adrien Mogenet, auto, Chad Kreimendahl, Christoph Blank, David Fooks, Doug Hudson, Doug Mayer, Eliot Horowitz, James Harrison, Jason Bellows, Jon Hyman, Kevin Kwast, Klaus Lehner, Lennart Koopmann, Michael Parrish, phoenix, Remon van Vliet, Sékine Coulibaly, Siddharth Sharma, Travis Reeder, Trey Shugart
|
||||||||||||||||||||||||||||||||
| Description |
|
Summary: Two optimizations have been implemented for the count operation: 1) Normally, when documents are retrieved from an indexed Cursor, the index key and/or document at each position of the cursor are checked by a Matcher to see if they match the query. As an optimization, this Matcher check is now bypassed in simple cases when the documents iterated by the Cursor must always match the query. Specifically, if an 'Optimal' btree index is used to perform a count, and the index bounds for the count's query spec on that index are determined to exactly describe the documents that match the query, then a Matcher is not used. An explain( true ) on a find with the same query spec as the count will generally indicate if an 'Optimal' index is used (generally the case if there is only one btree plan in allPlans) and will show the indexBounds on the index. 2) Normally, when a btree cursor iterates over keys in a btree, the cursor checks every key traversed to see if it falls within the calculated indexBounds. As an optimization, this key check is now bypassed in simple cases where the iteration endpoint can be precomputed. Specifically, if an 'Optimal' btree index is used to perform a count, and the indexBounds describe a single interval within the btree, then the endpoint of that interval is located in advance so that the traversed keys do not need to be individually checked for inclusion in the interval. An explain( true ) of an optimal index will generally indicate usage of a single interval when the cursor explain field does not have a "multi" suffix. Aaron --------------------------------- The count performance is so pool that we can not use it,if the client must wait 5000 millis for every count request,they will be unhappy! ) need so much more time. } a simple count will need 4677ms,if the result is smaller,eg 51140 but not 2101980,the count query will done less than 200ms! } the count() can not use explain(),so I use db.Comment.find( {appId:1,topicId:1}).explain(),but the result maybe the same. |
| Comments |
| Comment by Eliot Horowitz (Inactive) [ 21/Dec/12 ] |
|
Jon - thanks, forgot about that message, updated. |
| Comment by Jon Hyman [ 21/Dec/12 ] |
|
Okay, thanks for the update. In general, you should update the "We aim to do a stable release every 3 months" tag on jira. 2.2 came out almost a year after 2.0, so I was worried that 2.4 was going to be far into the future. |
| Comment by Eliot Horowitz (Inactive) [ 21/Dec/12 ] |
|
Jon - no, unfortunately a change like this is not back portable as its relatively large, and stability on 2.2 is paramount. |
| Comment by Jon Hyman [ 21/Dec/12 ] |
|
Is there any way this could be backported into a 2.2.x build? Without a clear understanding of the 2.4 release timeframe (since it's not on the roadmap AFAICT), we would really benefit from this right now. While we've precomputed a great deal of queries we otherwise would have used count() for, some of our count queries are hard to precompute and take a long time. |
| Comment by auto [ 20/Nov/12 ] |
|
Author: {u'date': u'2012-11-20T10:36:10Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 20/Nov/12 ] |
|
Author: {u'date': u'2012-11-20T04:45:09Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 20/Nov/12 ] |
|
Author: {u'date': u'2012-11-20T04:14:31Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 20/Nov/12 ] |
|
Author: {u'date': u'2012-11-20T03:03:59Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 20/Nov/12 ] |
|
Author: {u'date': u'2012-11-04T21:55:36Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 12/Nov/12 ] |
|
Author: {u'date': u'2012-11-04T21:55:36Z', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by Trey Shugart [ 29/Oct/12 ] |
|
@Jon is right. There are a few glaring issues, such as |
| Comment by auto [ 24/Oct/12 ] |
|
Author: {u'date': u'2012-10-23T20:53:57-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 24/Oct/12 ] |
|
Author: {u'date': u'2012-10-14T20:48:04-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 24/Oct/12 ] |
|
Author: {u'date': u'2012-10-11T11:44:38-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 24/Oct/12 ] |
|
Author: {u'date': u'2012-10-10T13:33:07-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by auto [ 10/Oct/12 ] |
|
Author: {u'date': u'2012-10-09T09:38:18-07:00', u'email': u'aaron@10gen.com', u'name': u'Aaron'}Message: |
| Comment by Jon Hyman [ 12/Sep/12 ] |
|
The 10gen team already knows what to do; it should just be fixed. Documenting it creates an excuse for accepting bad performance. |
| Comment by Travis Reeder [ 12/Sep/12 ] |
|
@Chad: We started using counters everywhere. Although they create more write locks since you have to update the counters all the time, the queries on the counters are super fast which is much better than indeterminately long count() queries. Another thing we've done specifically for pagination is we don't show the total number of items/pages when paging, instead we do it more Google like where we show a few numbers they can click and a Next link. That's the easiest way to avoid doing a count() for paging without needing counters. |
| Comment by Chad Kreimendahl [ 12/Sep/12 ] |
|
Travis... What about using it to determine paging requirements for a dynamic grid display? I think that's very likely the case that most people are seeing. Paging of large chunks of data is fairly important, and often times that means you need to know how much data you'll be paging. |
| Comment by Travis Reeder [ 12/Sep/12 ] |
|
+1 on documenting this. Nobody should really be using count() except for maybe one off ad-hoc queries, but not in an application where it will be used regularly. |
| Comment by Michael Parrish [ 12/Sep/12 ] |
|
Just got bit by this one as well. Naive pagination in our application was growing linearly with the collection size. If optimizing this in a reasonable amount of time is not an option, then I would strongly encourage 10gen to make this very clear in the documentation. |
| Comment by Doug Hudson [ 29/Aug/12 ] |
|
This is a biggie. We have to do a very large number of counts for some analyses, and even with an index on the attribute (I run with --notablescan, so it has to be there), setting limit to 2, and count(true) it can take a very long time just to determine if more than one document matches a find()(>1 second for ~2million records). This is with ruby. I think I am better calling .next() twice on the cursor to determine whether count==1 and do away with count altogether for the time being. |
| Comment by Jason Bellows [ 14/Aug/12 ] |
|
We are also having major problems with count() being too slow. Would love to see this optimized ASAP. |
| Comment by Travis Reeder [ 02/Aug/12 ] |
|
We've run into this countless (no pun intended) times and like the issue says, count() is pretty much unusable. Here is an example of how bad and inconsistent it is. Under no load, it's generally very fast, like a couple of ms, but when we apply a bit of load, the time it takes to execute can be anywhere from a few ms to to 28 minutes!! Aug 02 00:29:40 23.23.16.13 kernel: staging: 2012/08/02 07:29:40 reserved count time 1 ms |
| Comment by Jon Hyman [ 26/Jul/12 ] |
|
Please commit to doing this in 2.3.0. We're having to start looking at rewriting parts of our code to not use MongoDB due to slow count performance. |
| Comment by David Fooks [ 26/Jul/12 ] |
|
Ah, ok sounds like it should make a big difference! Thanks for the update. |
| Comment by Eliot Horowitz (Inactive) [ 26/Jul/12 ] |
|
Main issue is that we do actually do the key comparison for every doc. |
| Comment by David Fooks [ 26/Jul/12 ] |
|
@Eliot Out of interest how are you optimizing this? What sort of speed up do you expect to gain from it? To be honest I'm more interested in how this might affect https://jira.mongodb.org/browse/SERVER-2274 as I've been holding off writing a work around until our collection gets too big and these count operations become too slow. |
| Comment by Eliot Horowitz (Inactive) [ 26/Jul/12 ] |
|
@david - no, counting b-trees are generally not something we think make sense, this is is more of straight optimization, but could be significant. |
| Comment by Christoph Blank [ 26/Jul/12 ] |
|
Apologies, I meant basic from the user perspective. Our pagination does the count which seems trivial from that perspective, but slows down the whole application .. I can of course not judge if it is basic from an implementation point of view |
| Comment by David Fooks [ 26/Jul/12 ] |
|
@Christoph I'm not sure this is a basic problem. I think they will need to implement counted b-trees as a whole new type of index. |
| Comment by Christoph Blank [ 26/Jul/12 ] |
|
we also have this very basic problem - quite annoying, I hope there is a fix soon |
| Comment by Eliot Horowitz (Inactive) [ 22/Jul/12 ] |
|
20 seconds for 50k still doesn't make sense, so if you could send the explain would be interesting |
| Comment by James Harrison [ 21/Jul/12 ] |
|
@Eliot - The 'fix version' has changed on this ticket at least once. If this /is/ getting fixed for 2.3, that's great. If it's just 'that's the one we're not doing now' as it kinda looks, then it'll be time to look for another solution. The server is doing 100 reqs/s on average, with about 50 reqs/s being counts. This server is thrashing itself resolving these queries. Since the queries are user-specific, caching in appspace yields limited results, though we're looking at doing some more of that. As far as explain goes - the same query being used for a count run as a |
| Comment by Eliot Horowitz (Inactive) [ 21/Jul/12 ] |
|
@James - if 50k entries takes 20 seconds, I don't think that has anything to with this issue. Sounds like something else. This is scheduled for 2.3.0, which is the next dev cycle, see "fix version" above. |
| Comment by Sékine Coulibaly [ 21/Jul/12 ] |
|
@James Harrison : This issue is just frustrating and really worrying as you said. Seen absolutely no progress on this since I started with MongoDb (never commited to Mongo, since this bug was never corrected) with 1.8... An update on "why this ins not fixed, why a fix is not planned" from 10gen would be great. |
| Comment by James Harrison [ 21/Jul/12 ] |
|
Given the large number of use cases which depend on quick count() performance (pagination being one of the more obvious generally applicable cases), is this ever going to get looked at? The fact that it has been sat at high priority for so long raises some worrying questions. For collections of 50,000 entries on a very fast machine (8 cores, 16GB RAM, Intel SSDs) we've seen counts that take in excess of 20 seconds per call. Given 50,000 entries is not by any stretch a particularly large collection and that we're growing that collection by about 1.5k a day, we're now having to consider rewriting large chunks of code and removing functionality to avoid this bug, or moving from MongoDB. |
| Comment by Adrien Mogenet [ 15/Apr/12 ] |
|
Browsing and counting 500 elements is pretty fast, you shouldn't be affected. If you're using your whole set, you may also retrieve it entirely and count elements in your client code. |
| Comment by Siddharth Sharma [ 15/Apr/12 ] |
|
This will affect us in production since for paging we need the count of total records. Will the same affect us if total count of results for a query is < 500 records? -Sid |
| Comment by Kevin Kwast [ 22/Feb/12 ] |
|
Is there a way to quickly check how many records are in an index? (like the quick count for number of objects in a collection) Actually running the count({}) is brutally slow and creates a lot of disk I/O if the index you want to check isn't already cached. |
| Comment by Adrien Mogenet [ 05/Jun/11 ] |
|
Getting an estimated result in a very short time could be acceptable in some cases. Getting the depth of the (indexed criteria) subtree, with its average width could be a first optimization effort ? It's a kind of mysql trick in innoDB indexes if I remember... |
| Comment by Remon van Vliet [ 31/May/11 ] |
|
Duplicate of 2274. An actual scan should not be necessary for "attr" that have primary types and are covered by an index, assuming it is a counted btree. If not adding subtree node counts should speed this up significantly. Given the common use of range based count() operations as well as it's direct relation to result paging and such I do agree that it would be very nice to get this into one of the early 2.x versions if at all possible. |
| Comment by Lennart Koopmann [ 12/May/11 ] |
|
+1 What is the status on this? We need to work around this stuff in Graylog2 because it is too slow. |
| Comment by Sékine Coulibaly [ 02/May/11 ] |
|
Running 1.8.1 here, same issue. Roughly 10 seconds to count around 10 millions documents based on attr:val (99% of the documents have same integer value for attr) on an 8 proc box, 16GB RAM, and a 4GB mongo base. AG: A little disappointing for such a basic need, would be nice if this was scheduled, better if before December 2012 ! |
| Comment by Eliot Horowitz (Inactive) [ 22/Apr/11 ] |
|
Scanning 3 M records is going to take some time, even with an index. |
| Comment by Doug Mayer [ 20/Apr/11 ] |
|
Can this really go on as unscheduled for this long? We're experiencing very long wait times on this type of query with only 3 million records. It seems like it would be a very common operation, which makes me feel like we must be doing something wrong? Can someone on the core server team at least confirm this isn't a misuse or misconfiguration on our part to try to get the ball rolling on it? |
| Comment by Adrien Mogenet [ 29/Jan/11 ] |
|
You can vote for http://jira.mongodb.org/browse/SERVER-2435 |
| Comment by Adrien Mogenet [ 12/Jan/11 ] |
|
I experience similar issue in my 3 shards configurations. When I attempt to perform a count() an a field (not the shard-key, but indexed anyway) it can take more than 30s. (1 million matching results to count within 200 000 millions rows). |
| Comment by Klaus Lehner [ 12/Jan/11 ] |
|
You might also vote for http://jira.mongodb.org/browse/SERVER-2274 |
| Comment by Klaus Lehner [ 12/Jan/11 ] |
|
We experience similar issues and as that came to be quite a show-stopper for our app, we introduced a cache in our Java Applicaiton Server that caches the count result for queries and then returns that number for a couple of time. Depending on how your queries look like, that might be quite hard to implement. |