[SERVER-64340] Warn if balancer is disabled while draining shard Created: 09/Mar/22  Updated: 29/Oct/23  Resolved: 04/Jul/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.0.1, 5.0.11, 6.1.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Antonio Fuschetto
Resolution: Fixed Votes: 0
Labels: shardingemea-qw
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0
Sprint: Sharding EMEA 2022-05-02, Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30, Sharding EMEA 2022-06-13, Sharding EMEA 2022-06-27, Sharding EMEA 2022-07-11
Participants:
Story Points: 4.5

 Description   

We are constantly getting bit in Atlas by shard draining that is "stuck" because the customer has disabled the balancer. Inevitably, we spend time looking through the logs for a "problem" and finally realize the balancer is off. Draining a shard with the balancer off is not a normal configuration. Could we please log a message about this in the logs, periodically?

One possible solution would be to make the balancer check if some shard needs to be drain even if balancing have been disabled and periodically log a warning.



 Comments   
Comment by Githook User [ 26/Jul/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-64340 Warn if balancer is disabled while draining shard
Branch: v6.0
https://github.com/mongodb/mongo/commit/943744bef3fb7de71790f95d2fdc8201ba52ba28

Comment by Tommaso Tocci [ 05/Jul/22 ]

so we went with a log message saying "Draining of removed shards cannot be completed because the balance is disabled" correct?

Yes correct.

and it sends that how often? Couldn't calculate that from the PR.

the log is emitted every 10 min by the primary node of the config server replicaset.
We decided to emit this log message only if the balancer is disable. Thus in case the balancer is enabled but currently outside the balancing time window, the log won't be emitted.

Comment by Garaudy Etienne [ 05/Jul/22 ]

so we went with a log message saying "Draining of removed shards cannot be completed because the balance is disabled" correct?

and it sends that how often? Couldn't calculate that from the PR.

Comment by Githook User [ 04/Jul/22 ]

Author:

{'name': 'Antonio Fuschetto', 'email': 'antonio.fuschetto@mongodb.com', 'username': 'afuschetto'}

Message: SERVER-64340 Warn if balancer is disabled while draining shard
Branch: master
https://github.com/mongodb/mongo/commit/11e760fff8b3daa929a4e25b1f750641907b9da6

Comment by Antonio Fuschetto [ 27/Jun/22 ]

The removeShard operation returns with a message reporting the current status of the drainage, i.e., started, ongoing, or completed. Ideally, when the balancer is disabled, this operation should return a status to inform the user that the operation is currently blocked due to the inability to migrate chunks to other shards. Contextually, a warning message should be triggered to inform the cluster operations team.

An alternative is to simply trigger a periodic warning message when the balancer is disabled and one or more shards need to be drain, as described in the ticket's description.

tommaso.tocci@mongodb.com, please let me know what's the better approach based on your view of the problem.

Generated at Thu Feb 08 06:00:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.