[SERVER-24757] Report number of keys in an index Created: 23/Jun/16 Updated: 23/Jan/24 Resolved: 10/Jun/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Chris Harris | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: | |
| Case: | (copied to CRM) |
| Description |
|
There are times when it would be helpful to know how many keys are in an index. Currently there does not appear to be an easy way to access this information. We should expose this metadata, perhaps via the $indexStats command. |
| Comments |
| Comment by Eric Milkie [ 10/Jun/19 ] | |||||||||||||||||||
|
This information is not currently tracked and would have a cost to do so. | |||||||||||||||||||
| Comment by Ian Whalen (Inactive) [ 11/Jul/16 ] | |||||||||||||||||||
|
kevin.pulo this is indeed a similar problem to returning the exact number of documents in a database in constant time, but we do still intend to do it eventually. | |||||||||||||||||||
| Comment by Kevin Pulo [ 30/Jun/16 ] | |||||||||||||||||||
|
To do this as efficiently as you'd like would require counted B-trees (in general). Implementing such an index type would be a non-trivial change, and has previously been determined not to be worth the effort, see The problem is that every time an entry is added or removed from the tree, the updated count needs to be propagated up to the root. This causes contention when updating the count of the root node, which impacts the possible concurrency of index insertions/removals (since updates to the root counter must be serialised/single threaded). (You could possibly use something like RCU for efficient lockless updates, but that would mean that the code is even more complex/advanced.) Thus, although you could get the desired count in O(1), you'd be paying for that by decreasing concurrency and adding O(log n) every time the count is updated. This would be fine for read-heavy applications, but a disaster for write-heavy loads (which already suffer from the penalty of maintaining the index). It's relatively rare to want to know how many keys are in an index, compared to to how many documents are in the collection (which is maintained in metadata). Since it makes sense to optimise for the common case, MongoDB has used regular B-trees rather than counted ones (which are also simpler and easier to implement). Thus, in those situations where this number is really needed, the best option would be to use explain to return the stats from a covered scan of the index, as you've suggested. Since indexes should always fit in RAM anyway (because otherwise you'll have bigger problems), the index scan ought to have a "tolerable" impact. The only other option that I can think of would be a schema adjustment of some sort, where the app manually maintains the number somewhere else (which should be okay for read-mostly applications). I don't think $indexStats can help because that's about empirical index usage, rather than anything about its shape. | |||||||||||||||||||
| Comment by Chris Harris [ 24/Jun/16 ] | |||||||||||||||||||
|
Thanks kyle.suarez It is good that validate provides an option to skip scanning the data. However it still requires a full scan of the index to gather the information. I suspect we can similarly get the information by performing an unfiltered find() that projects out the unindexed fields (to make it covered) and examine the executionStats.totalKeysExamined from the explain("executionStats") output. What I am proposing is the ability to access this metadata information in O(1) time. I am hopeful that it is already tracked somewhere, as opposed to dynamically calculated which can have performance impact, and can simply be reported out. | |||||||||||||||||||
| Comment by Kyle Suarez [ 23/Jun/16 ] | |||||||||||||||||||
|
christopher.harris, I believe running a validation on a collection will report the number of keys in each index.
|