[SERVER-34819] Optimize the sharding balancer's cluster statistics gathering Created: 03/May/18 Updated: 06/Feb/24 |
|
| Status: | Open |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Tommaso Tocci |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | QWB | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding EMEA
|
||||||||||||||||||||
| Sprint: | Sharding EMEA 2023-10-16, Sharding EMEA 2023-10-30, CAR Team 2023-11-13, CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-01-22, CAR Team 2024-02-05, CAR Team 2024-02-19 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
The sharding balancer currently issues listDatabases against every single shard in order to access the totalSize value. This value is used for ensuring that a shard's storage maxSize is not exceeded for users which have that value set. The listDatabases call is quite heavy, especially for nodes with large number of databases/collections since it will fstat every single file under the instance. There are a number of optimizations we can make in order to make this statistics gathering less expensive in the presence of maxSize (listed in order of preference):
|
| Comments |
| Comment by Kaloian Manassiev [ 03/May/18 ] |
|
I can think of multiple ways to do that, but I am not sure which one is realistic and/or independent of the storage engine. For example - can the sizes be lazily updated as WT writes to its files and only for the files that it writes to/garbage collects? Not sure if this is possible in real-time, but perhaps it can be combined with an asynchronous thread which runs periodically and only {{fstat}}s the files which were written to. Alternatively, can we use the same mechanism which is used for the fast count, combined with some multiplier of the average data size? This is what we use for chunk size estimation. |
| Comment by Eric Milkie [ 03/May/18 ] |
|
I'm not sure how a cache would work. The calculation today locks each database and then iterates through every collection, adding its size plus all the collection's index sizes together. The expense here probably comes from iterating through all collections and indexes on the entire system. If this value were cached, when would it get updated? |
| Comment by Kaloian Manassiev [ 03/May/18 ] |
|
milkie, the sharding balancer uses the totalSize value from listDatabases in order to estimate the data usage on a shard and running this command in the presence of large number of collections/indexes is very heavy. What is the possibility of the storage team exposing a cachedTotalSize value, which doesn't fstat every single file on the instance? |