[SERVER-82325] Config server could invariant during balancer round Created: 19/Oct/23  Updated: 10/Nov/23  Resolved: 10/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.25, 6.0.11
Fix Version/s: 5.0.22, 7.0.3, 4.4.26, 6.0.12

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Fixed Votes: 0
Labels: balancer-round-perf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-82322 Revert SERVER-40459 Optimize the cons... Closed
Problem/Incident
is caused by SERVER-40459 Optimize the construction of the bala... Closed
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v7.0, v6.0, v5.0, v4.4, v4.2
Sprint: Sharding EMEA 2023-10-30, CAR Team 2023-11-13
Participants:

 Description   

Summary

In SERVER-40459 we changed the logic used by the balancer to decide which chunks to move in a specific balancer round. The new code is affected by a bug, for which it could happen that we schedule more than one migration with the same donor shard.
When this happens, the balancer will hit an invariant and the primary of the config server will shut down, triggering a new primary election.

Required conditions

There are two code paths whose execution can lead to this bug and In both cases there are some necessary conditions that need to be met in order to hit the invariant.

  • Sharded cluster
  • Balancer enabled
  • At least 4 shards

moreover, depending on the code path there are specific conditions that need to be met:

  • Shard removal
    • At least one shard being drained
    • At least one zone configured on the draining shard
    • Draining shard have at least two chunks belonging to different zones that can be moved in the same round to two different recipient shards.
      Note: chunks that are not completely contained within any of the configured zones are considered to belong to the special "no-zone".
  • Zone enforcing
    • At least two chunks residing on the same shards.
    • They belong to two different zones not associated to the shard.
    • The two chunks can be moved in the same balancer round.

Technical description

TODO

Affected versions

The only releases affected by this bug are:

  • 6.0.11
  • 4.4.25


 Comments   
Comment by Githook User [ 06/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round

This effectively reverts SERVER-40459

GitOrigin-RevId: 1818719db3009bcc10addd016f08f35c2b6718d8
Branch: v4.4
https://github.com/mongodb/mongo/commit/f45ff415970f8244303c77a964948630ce3bc93b

Comment by Githook User [ 05/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round

This effectively reverts SERVER-40459

GitOrigin-RevId: be9ea3f19e95fc92a6835c9c30381fb29962b45e
Branch: v4.4
https://github.com/mongodb/mongo/commit/1818719db3009bcc10addd016f08f35c2b6718d8

Comment by Githook User [ 05/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round

This effectively reverts SERVER-40459

GitOrigin-RevId: 7a3c2a47cee449b5c02392b08dc5b33265cbe11b
Branch: v4.4
https://github.com/mongodb/mongo/commit/be9ea3f19e95fc92a6835c9c30381fb29962b45e

Comment by Githook User [ 04/Nov/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round

This effectively reverts SERVER-40459

GitOrigin-RevId: 595d8110122be5b1ca2d951c6f71cd8ecb29b8cb
Branch: v4.4
https://github.com/mongodb/mongo/commit/7a3c2a47cee449b5c02392b08dc5b33265cbe11b

Comment by Githook User [ 22/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round

This effectively reverts SERVER-40459

GitOrigin-RevId: 5d091b472fa127626698ea9244e3a7fb5f727849
Branch: v4.4
https://github.com/mongodb/mongo/commit/595d8110122be5b1ca2d951c6f71cd8ecb29b8cb

Comment by Tommaso Tocci [ 20/Oct/23 ]
{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: Revert "SERVER-40459 Optimize the construction of the balancer's collection distribution status histogram"

This reverts commit 56f414fe81c8fd7ae2ad87ada1ed0f9cb1299151.
Branch: v6.0
https://github.com/mongodb/mongo/commit/e6230d4c34ba250a3129ccfa5da06ccbc5b6d536

Comment by Githook User [ 19/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round
Branch: v7.0
https://github.com/mongodb/mongo/commit/b96efb7e0cf6134d5938de8a94c37cec3f22cff4

Comment by Githook User [ 19/Oct/23 ]

Author:

{'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}

Message: SERVER-82325 Config server could invariant during balancer round
Branch: v5.0
https://github.com/mongodb/mongo/commit/302c19437ae65b7d459360e18c5ac5086f494989

Generated at Thu Feb 08 06:48:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.