[SERVER-8863] balancer won't start until all shards are reachable once Created: 05/Mar/13  Updated: 06/Dec/22  Resolved: 15/Dec/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.4.0-rc2
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Greg Studer Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Sharding
Operating System: ALL
Participants:

 Description   

The balancer does an OID check to determine if all shards are unique processes, but this check happens before it starts and requires all shards be reachable at the same time. After that time, the balancer will act as usual. A cluster should to function well even when partially unavailable when mongos starts, so we should at least remember which hosts we've contacted already successfully and not retry those hosts, or better, allow migrations between hosts that we know are fine.



 Comments   
Comment by Kaloian Manassiev [ 15/Dec/17 ]

The balancer logic requires shard hosts to be up and running so they can be examined for their storage utilization, which is accounted for by the balancer policy (in the cases where maxSize is specified for the shard).

While partially-available cluster should not prevent CRUD operations from running, balancing without all the hosts presents may lead to poor migration choices and prevents us from improving the balancing policies. Closing this ticket as won't fix.

Generated at Thu Feb 08 03:18:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.