[SERVER-7745] Add option to indexes to create counted b-tree for improved count() performance Created: 22/Nov/12  Updated: 19/May/14  Resolved: 20/Dec/12

Status: Closed
Project: Core Server
Component/s: Index Maintenance
Affects Version/s: 2.3.0
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Remon van Vliet Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: count, performance, query
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-1752 improve the performance of simple counts Closed
Backwards Compatibility: Fully Compatible
Participants:

 Description   

It would be very useful to allow us to mark an index as counted (uses a counted b-tree rather than the current approach). This would result in larger indexes but would dramatically improve count() performance in most cases.



 Comments   
Comment by Remon van Vliet [ 21/Dec/12 ]

Yep, agree. Thanks!

Comment by Eliot Horowitz (Inactive) [ 20/Dec/12 ]

Remon, those numbers look a lot better.
I think the direction we're going on optimization is the right approach, so going to close this.

Comment by Remon van Vliet [ 27/Nov/12 ]

Did a quick test, details here : http://imgur.com/9nxyH

Numbers are operations per second. All tests were done after the database was warmed up. There are some weird inconsistencies in the data (e.g. counts on a larger dataset or sometimes faster than on a smaller set) but those inconsistencies are there regardless of how many tests I run. This test was done single threaded. Will look at multithreaded diferences when I get some time for that.

Comment by Eliot Horowitz (Inactive) [ 27/Nov/12 ]

2.3.1 is up, so would be great to see the comparison

Comment by Remon van Vliet [ 26/Nov/12 ]

Makes sense. I'll run some benchmarks on this build and see if performance improvements are significant enough so that we can remove some of the workarounds in our code.

Comment by Eliot Horowitz (Inactive) [ 25/Nov/12 ]

Ah, single interval meaning one section of the btree.
SO x = 5 or x between 4 and 6

$in, $or is handled elsewhere

Comment by Remon van Vliet [ 22/Nov/12 ]

It's in the query optimizer (src/mongo/db/queryoptimizer.cpp) line 314 in the commit https://github.com/mongodb/mongo/commit/428bb865bfd24848ffba0fb9b9514264e5b3816e.

Comment by Eliot Horowitz (Inactive) [ 22/Nov/12 ]

Where do you see that constraint listed?
Hard without context, but sounds like it might not support $in yet, but don't see the message, so hard to tell.

Comment by Remon van Vliet [ 22/Nov/12 ]

@Eliot Awesome! I'm certainly not an expert on how to implement database indexes so my suggestion to use counted b-trees wasn't based on an real life expertise. Just checked out the commits for 1752 and the specialized cursor for count queries that uses index value ranges looks like a much better solution. What is meant by the constraint that the field range vector should exactly represent a single interval in real world count queries?

Comment by Eliot Horowitz (Inactive) [ 22/Nov/12 ]

When 2.3.1 one comes out in a few days, would be great if you could try the counts and measure the difference.
Just remember 2.3.1 is unstable, so shouldn't be used for production.

Comment by Eliot Horowitz (Inactive) [ 22/Nov/12 ]

I don't think counted b-trees make sense for a variety of reasons (concurrency, space, performance) and do not think any database uses them (could be wrong, but research hasn't found a single one).

In 2.4, we've improved the algorithm for counting which has dramatically improved the performance of many counts where this would help.

See SERVER-1752

Generated at Thu Feb 08 03:15:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.