[SERVER-29489] Balancer stats generation calls 'listDatabases' on shards without maxTimeMS or timeout Created: 07/Jun/17  Updated: 30/Oct/23  Resolved: 13/Jun/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.4, 3.5.8
Fix Version/s: 3.4.6, 3.5.9

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4
Sprint: Sharding 2017-06-19
Participants:
Case:

 Description   

The sharding balancer's statistics generation uses shardutil::retrieveTotalShardSize on each round in order to retrieve the shard's storage utilization.

This in turn calls listDatabases against the shard, but does not use maxTimeMS or timeout, which can cause the balancer to get stuck indefinitely.



 Comments   
Comment by Githook User [ 19/Jun/17 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-29489 Use 'maxTimeMs' when calling listDatabases against shards

(cherry picked from commit 133f0fd284abdf8a313d47b412f58e2cba60cc80)
Branch: v3.4
https://github.com/mongodb/mongo/commit/ba0eb93a1ae6d80b99542166919ff9e7074cbdf6

Comment by Githook User [ 13/Jun/17 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-29489 Use 'maxTimeMs' when calling listDatabases against shards
Branch: master
https://github.com/mongodb/mongo/commit/133f0fd284abdf8a313d47b412f58e2cba60cc80

Comment by Eric Milkie [ 09/Jun/17 ]

Sure, we could add that as a secret parameter just for sharding. I'm not sure we would call it "useCachedStats", as you are simply looking for the data size of all the collections and indexes, rather than what listDatabases gives you today: the actual on-disk size, which takes into account on-disk compression as well as deleted records.

Comment by Kaloian Manassiev [ 09/Jun/17 ]

I was looking at the namesOnly option and was thinking that perhaps we can add a useCachedStats option or something like that to listDatabases if it is indeed slow, but I would like to see how slow it is in practice.

milkie, is adding a useCachedStats parameter to listDatabases something that's generally workable?

Comment by Daniel Pasette (Inactive) [ 09/Jun/17 ]

Yes, of course the round trips make no sense. Retracting! However, i'm pretty sure storage size retrieval doesn't use cached data, it calls stat(2) on the backing file. That's why we added the "namesOnly" version of listDatabases in the first place. On MMAPv1, the number of calls is bounded because all collection/index data is stored in one set of files.

Comment by Kaloian Manassiev [ 09/Jun/17 ]

This call is used to get the total shard storage utilization, not on an individual collection basis. With this approach won't we have to make individual calls to get the statistics for each collection in every database? Seems to me that with large number of collections this would require a lot of roundtrips and might be worse than listDatabases.

Now that I looked at the implementation of the storage size retrieval for WT it seems to come from cached stats, so it should be pretty efficient already. It's only the MMAP V1's implementation, which is proportional to the size and might cause some disk I/O while iterating the extents.

Comment by Daniel Pasette (Inactive) [ 09/Jun/17 ]

would it be possible to use the collection and index stats directly to calculate this? It might be faster than getting the "real" storage size per shard because the number is in memory.

Comment by Kaloian Manassiev [ 07/Jun/17 ]

Yes, this can happen and the balancer round will be slowed down. We discussed the possibility of caching the storage utilization information and only refresh it periodically and asynchronously, but since so far we haven't seen cases where this has caused problems we haven't bothered to optimize it.

Another possible optimization may come from the project to move the autoSplit logic to the shards, in which case we could use the real-time information used by autoSplit to also do utilization estimates.

Comment by Daniel Pasette (Inactive) [ 07/Jun/17 ]

Won't this call also take a potentially very long time on wt instances that have 1000's of collections, causing a long pause between rounds if it does this on every shard serially?

Generated at Thu Feb 08 04:21:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.