[SERVER-59719] shardsvr{Commit, Abort}ReshardCollection may return unrecoverable error on stepdown, leading to fassert() on config server Created: 01/Sep/21  Updated: 29/Oct/23  Resolved: 10/Nov/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 5.0.2
Fix Version/s: 5.2.0, 5.0.5, 5.1.1

Type: Bug Priority: Major - P3
Reporter: Luis Osta (Inactive) Assignee: Brett Nawrocki
Resolution: Fixed Votes: 0
Labels: LFR-BUG
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-66353 Add documentation of concurrency rule... Closed
is related to SERVER-59800 Add a flag to the lock-free collectio... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.1, v5.0
Sprint: Sharding 2021-09-06, Sharding 2021-10-04, Sharding 2021-10-18, Sharding 2021-11-01, Sharding 2021-11-15
Participants:
Linked BF Score: 151
Story Points: 1

 Description   

Background

It is possible that if a stepdown occurs while the resharding operation is in progress, that the opCtx doing the commit will be killed before the opCtx handling the command does. Which means, for instance,  that the ShardsvrCommitReshardCollectionCommand could reach the final uassert even though the Resharding(Recipient/Donor)Service was not able to finish committing (because it was interrupted).

 
The config server primary won't retry on the error returned by the shard and will lead the config server primary to fassert in the ReshardingCoordinatorService.

Problem

  • The lock-free reads made it such that reads could happen concurrent to any stepdowns that were in progress.
  • This means that database reads that were reliant on _alwaysInterruptAtStepDownOrUp to verify that there wasn't a stepdown in progress such as ShardsvrCommitReshardCollectionCommand no longer work
  • This is because even if is _alwaysInterruptAtStepDownOrUp is set to true, the RSTL lock won't be acquired for a database read. Which means that the database read can complete before the opCtx would eventually be interrupted.
  • Before lock-free reads the database read would wait for the in-progress stepdown to complete and hence wouldwait for the opCtx to have been interrupted by the actively running stepdown. We want to replicate this behavior for our fix.
  • The current uassert being returned will not lead to the ReshardingCoordinatorService retrying the commit/abort command until completion. Instead it will lead to a fatal assertion.

Proposed Solution

Do a no-op write using doNoopWrite before performing the sanity check to assure that the state document has been deleted. This will make sure that the operation hasn't been interrupted before asserting that there are no state documents left.



 Comments   
Comment by Githook User [ 11/Nov/21 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-59719 Ensure resharding commit/abort completes before verifying

ShardsvrCommitReshardCollectionCommand's commit() call and
ShardsvrAbortReshardCollectionCommand's abort() call each perform a
write which will trigger the state document to be deleted on the donor
and recipient. To verify this is done, those commands perform a read to
check if the state documents still exist. Now that the RSTL is not
acquired during reads, it is not guaranteed that the command's opCtx
will be interrupted when performing a read despite calling
setAlwaysInterruptAtStepDownOrUp(). As a consequence, it is possible for
the command's write to have been interrupted due to a step down on the
donor/recipient, causing the document to still exist during the
verification read and triggering a uassert. To resolve this issue, the
commands now do a no-op write before the verification read, ensuring
that the first write has indeed completed without being interrupted.

(cherry picked from commit cca75006b85690faa641a15dfc9940d2a2add52d)
Branch: v5.1
https://github.com/mongodb/mongo/commit/b89a97340366b308491344bddd84deca0cb2fa5f

Comment by Githook User [ 11/Nov/21 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-59719 Ensure resharding commit/abort completes before verifying

ShardsvrCommitReshardCollectionCommand's commit() call and
ShardsvrAbortReshardCollectionCommand's abort() call each perform a
write which will trigger the state document to be deleted on the donor
and recipient. To verify this is done, those commands perform a read to
check if the state documents still exist. Now that the RSTL is not
acquired during reads, it is not guaranteed that the command's opCtx
will be interrupted when performing a read despite calling
setAlwaysInterruptAtStepDownOrUp(). As a consequence, it is possible for
the command's write to have been interrupted due to a step down on the
donor/recipient, causing the document to still exist during the
verification read and triggering a uassert. To resolve this issue, the
commands now do a no-op write before the verification read, ensuring
that the first write has indeed completed without being interrupted.

(cherry picked from commit cca75006b85690faa641a15dfc9940d2a2add52d)
Branch: v5.0
https://github.com/mongodb/mongo/commit/4f50641a4a1a34cda2cbfdfd0ec6073c1ed7d9db

Comment by Githook User [ 10/Nov/21 ]

Author:

{'name': 'Brett Nawrocki', 'email': 'brett.nawrocki@mongodb.com', 'username': 'brettnawrocki'}

Message: SERVER-59719 Ensure resharding commit/abort completes before verifying

ShardsvrCommitReshardCollectionCommand's commit() call and
ShardsvrAbortReshardCollectionCommand's abort() call each perform a
write which will trigger the state document to be deleted on the donor
and recipient. To verify this is done, those commands perform a read to
check if the state documents still exist. Now that the RSTL is not
acquired during reads, it is not guaranteed that the command's opCtx
will be interrupted when performing a read despite calling
setAlwaysInterruptAtStepDownOrUp(). As a consequence, it is possible for
the command's write to have been interrupted due to a step down on the
donor/recipient, causing the document to still exist during the
verification read and triggering a uassert. To resolve this issue, the
commands now do a no-op write before the verification read, ensuring
that the first write has indeed completed without being interrupted.
Branch: master
https://github.com/mongodb/mongo/commit/cca75006b85690faa641a15dfc9940d2a2add52d

Comment by Luis Osta (Inactive) [ 07/Sep/21 ]

dianna.hohensee After talking with max.hirschhorn I think its probably for the best if we leave supportsLockFreeRead alone and instead add a no-op write before the reads in:

  • _shardsvrCommitReshardCollection
  • _shardsvrAbortReshardCollection
Comment by Dianna Hohensee (Inactive) [ 02/Sep/21 ]

I haven't gone to look at what and how _alwaysInterruptAtStepDownOrUp works yet, but I have some initial thoughts. 1) one of the goals of the lock-free project was specifically to allow reads to run concurrently with stepdown/up instead of stalling. 2) we eventually want to move away entirely from locked reads, so I don't think falling back to them is a sustainable solution.

Comment by Luis Osta (Inactive) [ 02/Sep/21 ]

dianna.hohensee henrik.edin What do y'all think about the proposed fix?

Generated at Thu Feb 08 05:47:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.