-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: 4.4.25, 6.0.11
-
Component/s: None
-
Fully Compatible
-
ALL
-
v7.0, v6.0, v5.0, v4.4, v4.2
-
Sharding EMEA 2023-10-30, CAR Team 2023-11-13
Summary
In SERVER-40459 we changed the logic used by the balancer to decide which chunks to move in a specific balancer round. The new code is affected by a bug, for which it could happen that we schedule more than one migration with the same donor shard.
When this happens, the balancer will hit an invariant and the primary of the config server will shut down, triggering a new primary election.
Required conditions
There are two code paths whose execution can lead to this bug and In both cases there are some necessary conditions that need to be met in order to hit the invariant.
- Sharded cluster
- Balancer enabled
- At least 4 shards
moreover, depending on the code path there are specific conditions that need to be met:
- Shard removal
- At least one shard being drained
- At least one zone configured on the draining shard
- Draining shard have at least two chunks belonging to different zones that can be moved in the same round to two different recipient shards.
Note: chunks that are not completely contained within any of the configured zones are considered to belong to the special "no-zone".
- Zone enforcing
- At least two chunks residing on the same shards.
- They belong to two different zones not associated to the shard.
- The two chunks can be moved in the same balancer round.
Technical description
TODO
Affected versions
The only releases affected by this bug are:
- 6.0.11
- 4.4.25
- is caused by
-
SERVER-40459 Optimize the construction of the balancer's collection distribution status histogram
- Closed
- is duplicated by
-
SERVER-82322 Revert SERVER-40459 Optimize the construction of the balancer's collection distribution status histogram
- Closed