[SERVER-8863] balancer won't start until all shards are reachable once Created: 05/Mar/13 Updated: 06/Dec/22 Resolved: 15/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 2.4.0-rc2 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Greg Studer | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Sharding
|
| Operating System: | ALL |
| Participants: |
| Description |
|
The balancer does an OID check to determine if all shards are unique processes, but this check happens before it starts and requires all shards be reachable at the same time. After that time, the balancer will act as usual. A cluster should to function well even when partially unavailable when mongos starts, so we should at least remember which hosts we've contacted already successfully and not retry those hosts, or better, allow migrations between hosts that we know are fine. |
| Comments |
| Comment by Kaloian Manassiev [ 15/Dec/17 ] |
|
The balancer logic requires shard hosts to be up and running so they can be examined for their storage utilization, which is accounted for by the balancer policy (in the cases where maxSize is specified for the shard). While partially-available cluster should not prevent CRUD operations from running, balancing without all the hosts presents may lead to poor migration choices and prevents us from improving the balancing policies. Closing this ticket as won't fix. |