[SERVER-29489] Balancer stats generation calls 'listDatabases' on shards without maxTimeMS or timeout Created: 07/Jun/17 Updated: 30/Oct/23 Resolved: 13/Jun/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.4, 3.5.8 |
| Fix Version/s: | 3.4.6, 3.5.9 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Kaloian Manassiev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v3.4
|
||||||||
| Sprint: | Sharding 2017-06-19 | ||||||||
| Participants: | |||||||||
| Case: | (copied to CRM) | ||||||||
| Description |
|
The sharding balancer's statistics generation uses shardutil::retrieveTotalShardSize on each round in order to retrieve the shard's storage utilization. This in turn calls listDatabases against the shard, but does not use maxTimeMS or timeout, which can cause the balancer to get stuck indefinitely. |
| Comments |
| Comment by Githook User [ 19/Jun/17 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: (cherry picked from commit 133f0fd284abdf8a313d47b412f58e2cba60cc80) |
| Comment by Githook User [ 13/Jun/17 ] |
|
Author: {u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}Message: |
| Comment by Eric Milkie [ 09/Jun/17 ] |
|
Sure, we could add that as a secret parameter just for sharding. I'm not sure we would call it "useCachedStats", as you are simply looking for the data size of all the collections and indexes, rather than what listDatabases gives you today: the actual on-disk size, which takes into account on-disk compression as well as deleted records. |
| Comment by Kaloian Manassiev [ 09/Jun/17 ] |
|
I was looking at the namesOnly option and was thinking that perhaps we can add a useCachedStats option or something like that to listDatabases if it is indeed slow, but I would like to see how slow it is in practice. milkie, is adding a useCachedStats parameter to listDatabases something that's generally workable? |
| Comment by Daniel Pasette (Inactive) [ 09/Jun/17 ] |
|
Yes, of course the round trips make no sense. Retracting! However, i'm pretty sure storage size retrieval doesn't use cached data, it calls stat(2) on the backing file. That's why we added the "namesOnly" version of listDatabases in the first place. On MMAPv1, the number of calls is bounded because all collection/index data is stored in one set of files. |
| Comment by Kaloian Manassiev [ 09/Jun/17 ] |
|
This call is used to get the total shard storage utilization, not on an individual collection basis. With this approach won't we have to make individual calls to get the statistics for each collection in every database? Seems to me that with large number of collections this would require a lot of roundtrips and might be worse than listDatabases. Now that I looked at the implementation of the storage size retrieval for WT it seems to come from cached stats, so it should be pretty efficient already. It's only the MMAP V1's implementation, which is proportional to the size and might cause some disk I/O while iterating the extents. |
| Comment by Daniel Pasette (Inactive) [ 09/Jun/17 ] |
|
would it be possible to use the collection and index stats directly to calculate this? It might be faster than getting the "real" storage size per shard because the number is in memory. |
| Comment by Kaloian Manassiev [ 07/Jun/17 ] |
|
Yes, this can happen and the balancer round will be slowed down. We discussed the possibility of caching the storage utilization information and only refresh it periodically and asynchronously, but since so far we haven't seen cases where this has caused problems we haven't bothered to optimize it. Another possible optimization may come from the project to move the autoSplit logic to the shards, in which case we could use the real-time information used by autoSplit to also do utilization estimates. |
| Comment by Daniel Pasette (Inactive) [ 07/Jun/17 ] |
|
Won't this call also take a potentially very long time on wt instances that have 1000's of collections, causing a long pause between rounds if it does this on every shard serially? |