[SERVER-30060] Make the balancer gather storage statistics only for shards which have `maxSize` set Created: 07/Jul/17  Updated: 30/Oct/23  Resolved: 17/Jul/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.6, 3.5.9
Fix Version/s: 3.4.7, 3.5.11

Type: Improvement Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 3
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
related to SERVER-34819 Optimize the sharding balancer's clus... Open
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.4
Sprint: Sharding 2017-07-31
Participants:
Case:

 Description   

The sharding balancer currently issues listDatabases against every single shard in order to access the totalSize value. This value is used for ensuring that a shard's storage maxSize is not exceeded for customers which have that value set.

The listDatabases call is quite heavy, especially for nodes with large number of databases/collections since it will fstat every single file under the instance.

There are a number of optimizations we can make in order to make this statistics gathering less expensive (listed in order of preference):

  • Only gather storage statistics for shards which have maxSize set (implemented by this ticket)
  • Issue the listDatabases call in parallel against all shards so it doesn't take so different shards' execution overlaps
  • Cache the per-shard statistics so that they are not collected on every single round/moveChunk invocation
  • Collect the per-shard statistics asynchronously so that multiple concurrent moveChunk requests can benefit
  • Add a parameter to listDatabases to allow it to return cached data size instead of every time {{fstat}}ing all the files


 Comments   
Comment by Kaloian Manassiev [ 25/Jul/17 ]

Hi arie@netskope.com,

The optimization which we implemented is the first one in the list from the description - to not collect storage utilization statistics for shards which do not have the maxSize setting. We chose to go with it because it is the least risky to backport and also because we estimated that it would cover the majority of our customers' use cases, due to heterogeneous hardware being more of an exception.

Hope this helps and apologies for the confusion.

Best regards,
-Kal.

Comment by Arie Grapa [ 25/Jul/17 ]

After looking at the code change, it was not clear to me which one of the optimizations was chosen. Can you please share?

Comment by Githook User [ 17/Jul/17 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-30060 Do not gather shard disk usage statistics unless 'maxSize' is set

(cherry picked from commit e0136739285c097a7da59ba54d6bcd109bb184b5)
Branch: v3.4
https://github.com/mongodb/mongo/commit/060473c2794e1d86dc4e4988c031b9c7ebda8297

Comment by Githook User [ 17/Jul/17 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-30060 Do not gather shard disk usage statistics unless 'maxSize' is set
Branch: master
https://github.com/mongodb/mongo/commit/e0136739285c097a7da59ba54d6bcd109bb184b5

Generated at Thu Feb 08 04:22:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.