[SERVER-83956] Balancer wrongly emit warning message in multiversion clusters Created: 07/Dec/23  Updated: 06/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: 4.2.0, 4.4.0, 5.0.0, 6.0.0, 7.0.0, 7.2.0-rc0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Tommaso Tocci
Resolution: Unresolved Votes: 0
Labels: balancer-round-perf
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
is caused by SERVER-8279 Warn or do not allow the balancer to ... Closed
Operating System: ALL
Sprint: CAR Team 2024-01-08, CAR Team 2024-01-22, CAR Team 2024-02-05, CAR Team 2024-02-19
Participants:

 Description   

The balancer sporadically checks if all the shards in the cluster have the exact same binary version. When this is not the case, the balancer emits a warning log message.

This has two problems:

  • Correctness:
    During upgrade/downgrade procedure is totally expected that the shards in the cluster will have a mismatching binary version, thus I believe is incorrect to emit a warning log message.
  • Performance:
    Even though the check is actually performed sporadically, the balancer collects the shards binary versions every round during construction of ClusterStatistics (this is actually done multiple times per round). The retrieval of binary version is done by executing the serverStatus command on all shard primaries, this is done serially. Thus, in a cluster with large number of shards, this can slow down significantly the balancer round.

Generated at Thu Feb 08 06:53:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.