Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Won't Do
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 4.0.10
Component/s: Diagnostics
Labels:
None

Assigned Teams:

Storage Execution
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

At the moment the only known efficient way to get a system-wide table count for a mongod instance is to walk each of the databases and each of the collections getting a count of the total items.

Why this matters:

Our scalability issues, because of our structure, are entirely based on number of files. As we approach ~350,000 collections + indexes, we begin to experience numerous issues. So we run multiple instances per system of mongo, split on various ports, in order to keep that number down, yet properly utilize our Optane drives and memory.

Our solution to tracking this information systematically has been to have a tool that connects to each database, walks each db and counts the tables. We then report this back into a metrics database. We prioritize the creation of new dbs and new instances of mongo based on how many files. We've found the best mid-line to be the ~100k file range.

Running the above query is somewhat expensive and causes unnecessary utilization, and may load unneeded data into memory. I believe wiredTiger, internally, tracks its open tables and would be able to report this number (even if only an estimate) in its statistics.

Assignee:: [DO NOT USE] Backlog - Storage Execution Team
Reporter:: Chad Kreimendahl
Participants:: [DO NOT USE] Backlog - Storage Execution Team, Chad Kreimendahl, Chad Kreimendahl, Danny Hatcher
Votes:: 0 Vote for this issue
Watchers:: 8 Start watching this issue

Created:: Jun 28 2019 08:44:14 PM UTC
Updated:: Dec 06 2022 02:55:29 AM UTC
Resolved:: Jun 08 2020 06:39:30 PM UTC

Details

Description

Attachments

Activity

People

Dates