[SERVER-42013] Provide table counts (collections + indexes) like db.stats() but for the instance level Created: 28/Jun/19 Updated: 06/Dec/22 Resolved: 08/Jun/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics |
| Affects Version/s: | 4.0.10 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Chad Kreimendahl | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: |
| Description |
|
At the moment the only known efficient way to get a system-wide table count for a mongod instance is to walk each of the databases and each of the collections getting a count of the total items.
Why this matters: Our scalability issues, because of our structure, are entirely based on number of files. As we approach ~350,000 collections + indexes, we begin to experience numerous issues. So we run multiple instances per system of mongo, split on various ports, in order to keep that number down, yet properly utilize our Optane drives and memory. Our solution to tracking this information systematically has been to have a tool that connects to each database, walks each db and counts the tables. We then report this back into a metrics database. We prioritize the creation of new dbs and new instances of mongo based on how many files. We've found the best mid-line to be the ~100k file range.
Running the above query is somewhat expensive and causes unnecessary utilization, and may load unneeded data into memory. I believe wiredTiger, internally, tracks its open tables and would be able to report this number (even if only an estimate) in its statistics. |
| Comments |
| Comment by Danny Hatcher (Inactive) [ 15/Aug/19 ] |
|
I will pass along the request to get the equivalent of the dbStats command but for the whole mongod instance. |
| Comment by Chad Kreimendahl [ 08/Aug/19 ] |
|
So really instead of having to iterate every single database and count the collections and indexes, would it be possible to get this stat aggregated in a more efficient way to the mongod level. such that one query would get it vs say 250+ (in our case) |
| Comment by Chad Kreimendahl [ 28/Jul/19 ] |
|
I think dbstats covers it. I'm not sure why we spaced on seeing this. We're pulling dbstats on every database and creating trend information for them daily, but guess we were serializing and deserializing without including some fields that're in there.
So that just means it's a bit inefficient to get the whole mongod count, in that you have to query every database for its stats. It would be wonderful if the system level stats would return this aggregated information in a more rapid form. |
| Comment by Danny Hatcher (Inactive) [ 05/Jul/19 ] |
|
I apologize, I'm still a little unclear as to your use-case. You can access both dbStats (which provides a count of all collections and indexes within a given database) and serverStatus (which provides a count of all open tables on a given mongod) through any of our drivers. It shouldn't be difficult to incorporate either method programatically. Are you trying to capture a statistic other than a count of all collections / indexes or a count of all currently open tables? |
| Comment by Chad Kreimendahl [ 03/Jul/19 ] |
|
I guess for us we want to auto deploy the next customer onto the cluster with the least files automatically. A quick table count method would be the most efficient |
| Comment by Chad Kreimendahl [ 03/Jul/19 ] |
|
Not really. Since we need a programatic way to get the info, we're stuck using the C# driver. So open handles helps a bit, but doesn't get us there. Counting files on the system is something we do, but wouldn't work, long term as we don't want to manage this with yet another process running on another set of hardware (mongo servers, Linux) |
| Comment by Danny Hatcher (Inactive) [ 01/Jul/19 ] |
|
Hello Chad, As WiredTiger is currently (as of 4.0.10) a file-per-collection and a file-per-index storage engine, performing a count of files in the data directory should give you an idea of the maximum number of tables your instance will open. If you're looking for tables that are currently held open, we keep track of "open data-handles" in our serverStatus output under "wiredTiger"."data-handle"."connection data handles currently active". Are these statistics insufficient for your use case? |