[SERVER-23354] Implement sh.balancerReset() feature to recover unresponsive balancer control Created: 25/Mar/16  Updated: 06/Dec/22  Resolved: 31/Mar/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Scott Kurowski Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Documented
Related
related to SERVER-22669 CSRS balancer supports parallel migra... Closed
related to SERVER-22673 Update shell balancer control helpers Closed
Assigned Teams:
Sharding
Participants:

 Description   

New feature request, sh.balancerReset() or similar, to avoid the manual labor of attempting control recovery of unresponsive shard balancer conditions.

Proposed function analyzes the config database for balancer state. If the mongos process listed as having the lock is unresponsive, it should presume that mongos is down and attempt to take the balancer lock itself. If it can, then, it proceeds and tells everyone it has the lock now, etc.

The goal of this is to fix up situations where issuing a command, sh.stopBalancer() never times-out or error-out, etc. It should be completely fast and deterministic and easy and invisible to most users.



 Comments   
Comment by Andy Schwerin [ 31/Mar/16 ]

Since moving the balancer to the CSRS primary will obviate this, I'm closing this as "Won't Fix".

Comment by Andy Schwerin [ 31/Mar/16 ]

With SERVER-22669 and SERVER-22673, the CSRS primary will take control of the balancer, and so it will no longer be necessary or meaningful to take the balancer lock away from the current balancer. This is because the CSRS primary is also the distributed lock manager, so if you can write to it, the balancer will also be up and running.

Generated at Thu Feb 08 04:03:08 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.