[SERVER-65047] Enable tassert in ReshardingMetricsNew, but not while upgrading from older version Created: 29/Mar/22  Updated: 29/Oct/23  Resolved: 04/Apr/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Brett Nawrocki Assignee: Brett Nawrocki
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-65039 Disable tassert in ReshardingMetricsNew Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding NYC 2022-04-04, Sharding NYC 2022-04-18
Participants:

 Description   

ReshardingMetricsNew tasserts that the start time was written down for the resharding operation, but writing down the start time is new behavior. It's possible that the resharding operation began in an older version where the start time was not written down and is currently in the process of upgrading to this newer version. This tassert should therefore not trigger while upgrading or downgrading.

However, because resharding operations are aborted after sending the request to finalize the FCV version on the shards when upgrading, it is possible that a resharding recipient will update its FCV to its final value (i.e. isUpgradingOrDowngrading() is false) prior to aborting. It is therefore not possible to differentiate a resharding operation that has upgraded to the latest version since the beginning from one that has been the latest version throughout. For this reason, the resharding operations should be aborted first.

Furthermore, the resharding command currently ensures that the FCV cannot change while setting up the coordinator. However, it does not check to make sure that the current FCV is not currently in an upgrading or downgrading state. This allows for the possibility for a new resharding operation to begin during an FCV upgrade, after resharding operations are aborted, but before the shards complete the FCV upgrade (after making the above change to abort first), meaning that the operation could run across FCVs without being aborted. As such, the reshard command should fail if the current FCV is either upgrading or downgrading.

In summary, the changes above should guarantee that 1. resharding operations cannot begin during an FCV upgrade or downgrade and 2. during an FCV update, resharding operations will always be finished aborting before reaching the target FCV.



 Comments   
Comment by Githook User [ 01/Apr/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-65047 Strengthen guarantees for resharding aborting across FCVs

Previously, resharding operations were aborted after sending the request
to finalize the FCV version on the shards when upgrading, making it is
possible that a resharding recipient will update its FCV to its target
value prior to aborting. It was therefore not possible to differentiate
a resharding operation that has upgraded to the latest version since
starting from one that has been the latest version throughout. For this
reason, the resharding operations are now aborted first.

Furthermore, the resharding command currently ensures that the FCV
cannot change while setting up the coordinator. However, it did not
check to make sure that the current FCV is not currently in an upgrading
or downgrading state. After making the above change, this would allow
for the possibility for a new resharding operation to begin during an
FCV upgrade, after resharding operations are aborted, but before the
shards complete the FCV upgrade. This would have the consequence of the
operation running across FCVs without being aborted. As such, the
reshard command now fails if the current FCV is either upgrading or
downgrading.

These changes in combination should guarantee that during a version
change, a new resharding operation cannot begin and a previously running
resharding operation always aborts completely before reaching the target
version. Note that it is still possible for a resharding operation to
reach an upgrading or downgrading FCV before being aborted.

These changes were made in the interest of being able to assert that
newly added optional fields that should always be set were indeed set.
As such, this change also enables the assertion disabled by
SERVER-65039.
Branch: master
https://github.com/mongodb/mongo/commit/0425d814900d230115ea0e1b91fadf8ee2352919

Generated at Thu Feb 08 06:01:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.