[SERVER-62072] _configsvrReshardCollection may return without having waited for unsetting "reshardingFields" to replicate to majority Created: 15/Dec/21  Updated: 29/Oct/23  Resolved: 07/Jan/22

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.0, 5.1.0, 5.2.0-rc1
Fix Version/s: 5.3.0, 5.2.1, 5.0.7

Type: Bug Priority: Minor - P4
Reporter: Max Hirschhorn Assignee: Brett Nawrocki
Resolution: Fixed Votes: 0
Labels: neweng, sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
is related to SERVER-61444 Resharding uses of bumpCollectionVers... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1, v5.0
Sprint: Sharding 2021-12-27, Sharding 2022-01-10
Participants:
Story Points: 2

 Description   

The _configsvrReshardCollection command waits on the completion future returned by the ReshardingCoordinator. This completion future becomes ready after the local replica set transaction to delete the config.reshardingOperations document and unset the reshardingFields from the config.collections entry commits with w:majority. (Note that SERVER-61444 may change it to be w:1 and move when the wait for majority happens.)

If there is no in-progress ReshardingCoordinator and the desired shard key pattern already matches the current shard key pattern, then the _configsvrReshardCollection command returns without waiting on a completion future. This read of the config.collections entry to check the current shard key pattern happens with read concern level "local" and may therefore reflect changes which could be rolled back.

The impact here is low because the "effect" which can be rolled back is only the ReshardingCoordinator's removal of the reshardingFields from the config.collections entry. It is still guaranteed that the update to the shard key pattern in the config.collections has already been majority-committed.

repl::ReplClientInfo::forClient(opCtx->getClient()).setLastOpToSystemLastOpTime(opCtx);

can be used to ensure ServiceEntryPointMongod::Hooks::waitForWriteConcern() will wait for write concern.



 Comments   
Comment by Githook User [ 09/Feb/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-62072 Reshard command waits for cleanup to be majority committed

The _configsvrReshardCollection command waits on the completion future
returned by the ReshardingCoordinator. This completion future becomes
ready after the local replica set transaction to delete the
config.reshardingOperations document and unset the reshardingFields from
the config.collections entry commits with w:majority. However, if there
is a step down and the new primary has already replicated these deletes,
the command will return despite those deletes not necessarily being
majority committed. If a rollback occurs, the command could then return
without fully cleaning up the resharding operation by leaving the
reshardingFields in the config.collections entry. Now,
_configsvrReshardCollection will wait on the last system op time to
ensure the cleanup has been majority committed.

(cherry picked from commit 997bade5afb420cdf369d7fc66d7cb9498230635)
Branch: v5.0
https://github.com/mongodb/mongo/commit/a3675d1fd04bd8985f88ecf128ed336e59c53e6c

Comment by Githook User [ 09/Feb/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-62072 Reshard command waits for cleanup to be majority committed

The _configsvrReshardCollection command waits on the completion future
returned by the ReshardingCoordinator. This completion future becomes
ready after the local replica set transaction to delete the
config.reshardingOperations document and unset the reshardingFields from
the config.collections entry commits with w:majority. However, if there
is a step down and the new primary has already replicated these deletes,
the command will return despite those deletes not necessarily being
majority committed. If a rollback occurs, the command could then return
without fully cleaning up the resharding operation by leaving the
reshardingFields in the config.collections entry. Now,
_configsvrReshardCollection will wait on the last system op time to
ensure the cleanup has been majority committed.

(cherry picked from commit 997bade5afb420cdf369d7fc66d7cb9498230635)
Branch: v5.2
https://github.com/mongodb/mongo/commit/a9c71b98ad671afc7562c26f1bba5cc7450b13ba

Comment by Githook User [ 07/Jan/22 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-62072 Reshard command waits for cleanup to be majority committed

The _configsvrReshardCollection command waits on the completion future
returned by the ReshardingCoordinator. This completion future becomes
ready after the local replica set transaction to delete the
config.reshardingOperations document and unset the reshardingFields from
the config.collections entry commits with w:majority. However, if there
is a step down and the new primary has already replicated these deletes,
the command will return despite those deletes not necessarily being
majority committed. If a rollback occurs, the command could then return
without fully cleaning up the resharding operation by leaving the
reshardingFields in the config.collections entry. Now,
_configsvrReshardCollection will wait on the last system op time to
ensure the cleanup has been majority committed.
Branch: master
https://github.com/mongodb/mongo/commit/997bade5afb420cdf369d7fc66d7cb9498230635

Generated at Thu Feb 08 05:54:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.