[SERVER-50937] Make resharding coordinator support recovery Created: 14/Sep/20  Updated: 29/Oct/23  Resolved: 26/Jul/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 5.0.3, 5.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Haley Connelly Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: PM-234-M3, PM-234-T-lifecycle
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-57624 Coordinator should check participant ... Closed
depends on SERVER-50960 Modify PrimaryOnlyService's lookup() ... Closed
depends on SERVER-55682 Improve ability to test ReshardingCoo... Closed
is depended on by SERVER-51495 Reenable reshard_collection_basic.js ... Backlog
Related
related to SERVER-61483 Resharding coordinator fails to recov... Closed
is related to SERVER-49572 Implement onReshardingParticipantTran... Closed
is related to SERVER-50982 PrimaryOnlyService::lookupInstance sh... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Sharding 2021-07-12, Sharding 2021-07-26
Participants:
Linked BF Score: 19
Story Points: 3

 Description   

Needs further investigation. Contact blake.oler before starting work on this.

Notes

  1. It's okay for onCommit handlers to be run out of order if they have attached opTimes.
  2. How do we ensure that the resharding coordinator is always using the latest version of the document to replace the contents on disk?
  3. Is ensuring that promises are fulfilled on recovery as simple as updating the in-memory document after checking the future of the first promise? Would it be simpler to do away completely with an in-memory representation of the underlying document?
  4. Need to make sure that we don't write to the temporary resharding collection entry when it should have already been removed.

Out of date old description

Introduce methods that can manually fulfill the promises in the ReshardingCoordinatorObserver for the recovery process.

Create a flag for the ReshardingCoordinatorObserver, say _shouldObserveWrites, to prevent writes from fulfilling the observer's promises while in recovery. 

When in recovery (when the ReshardingCoordinatorService is constructed in state > kInitializing), construct the ReshardingCoordinatorObserver with _shouldObserveWrites to false until the ReshardingCoordinator has fully recovered and it is safe for the ReshardingCoordinatorObserver to begin observing writes again. We will flip this flag to true as a part of the recovery process (described below). Note that this means writes to config.reshardingOperations can happen before the ReshardingCoordinatorService is constructed or after its constructed but before we've done recovery. This is okay, because the coordinator will read from disk as a part of recovery (described below).

At the start of ReshardingCoordinatorService::run(), if the coordinator is recovering (its state is > kInitializing) do the following in order to fulfill any promises that would have been fulfilled already had we not failed over and observe any writes that came in before we started recovery:
1. Take the collection lock in mode S
2. Read config.reshardingOperations for this resharding op
3. Inspect the doc and fulfill any promises that should be fulfilled already
4. Flip ‘_shouldObserveWrites’ to be true
5. Release the collection lock



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 11/Aug/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-50937 Wrap operations in resharding coordinator that contact remote nodes with WithAutomaticRetry

(cherry picked from commit be9790dcf2de451d8e218f4471d2d8faa5f26aaa)
Branch: v5.0
https://github.com/mongodb/mongo/commit/2b8d36da02e582030c5c1c00c722c5e7879265c2

Comment by Githook User [ 11/Aug/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-50937 Refactor resharding coordinator to consolidate distinct error handling into separate phases.

(cherry picked from commit cbddf73dc78aa6a208fe3a43ca5e8674f67d5b87)
Branch: v5.0
https://github.com/mongodb/mongo/commit/900707746299e4684fa6b7f29beb65f6ff13b97c

Comment by Githook User [ 22/Jul/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-50937 Wrap operations in resharding coordinator that contact remote nodes with WithAutomaticRetry
Branch: master
https://github.com/mongodb/mongo/commit/be9790dcf2de451d8e218f4471d2d8faa5f26aaa

Comment by Githook User [ 07/Jul/21 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-50937 Refactor resharding coordinator to consolidate distinct error handling into separate phases.
Branch: master
https://github.com/mongodb/mongo/commit/cbddf73dc78aa6a208fe3a43ca5e8674f67d5b87

Comment by Max Hirschhorn [ 21/Nov/20 ]

I wanted to highlight a few details which will hopefully be helpful when thinking through how to implement recovery safely.

  • Each later promise in ReshardingCoordinatorObserver can only be fulfilled after the previous promise was fulfilled. This is due to scheme for how the coordinator's state advances only after hearing from all donors (or from all recipients) and the coordinator's state advancing is what leads to all the donors' (or all the recipients') states advancing. And so the need to order the ReshardingCoordinatorDocument from the fulfilled promises by their optime is very likely unnecessary.
  • Another caveat is that until a promise in ReshardingCoordinatorObserver is fulfilled, the coordinator document on disk may have newer information than the ReshardingCoordinator::_coordinatorDoc in-memory.
    • Immediately following a promise in ReshardingCoordinatorObserver being fulfilled, the coordinator document on disk may also have newer information due to how donors and recipients communicate their error state.

One idea and something which came up with Spencer Brody and Esha when the resharding project was first writing their primary-only services is to avoid having multiple threads write to the documents in the primary-only service-backed config.reshardingOperations collection.

  • ReshardingCoordinator currently does a full-document replacement when updating its coordinator state. With the previously mentioned caveat for how the error state is reported, this has the potential for ReshardingCoordinator to write ReshardingCoordinator::_coordinatorDoc back to disk and effectively forget about a shard reporting its error state when another shard has already reported one.
    • Depending on how ReshardingCoordinator instructs and awaits donors and recipients to do their cleanup in the unrecoverable error case, this may or may not matter because the resharding operation is still known to have had an unrecoverable error.
  • The ReshardingCoordinator doesn't need to do a full-document replacement, but doing so was suggested as being easier than managing the $sets it would otherwise need to do.
  • The ReshardingCoordinator doesn't need to know the DonorStateEnum or RecipientStateEnum values. It needs the shard IDs for the donors and recipients and needs the minFetchTimestamps of the donors to calculate the fetchTimestamp for the resharding operation.
    • We could split ReshardingCoordinatorDocument into two separate structs such that ReshardingCoordinatorDocument only contains the fields that ReshardingCoordinator actually writes to. Let's call the other ReshardingCoordinatorParticipantDocument.
    • The config.reshardingOperations collection would be written to exclusively by ReshardingCoordinator. Donors and recipients would perform their writes to (and ReshardingCoordinatorObserver would be notified for) a new config.reshardingOperations.participants collection.

Recovery for the ReshardingCoordinator would trigger the ReshardingCoordinatorObserver using the contents of the associated document in the config.reshardingOperations.participants collection. The ReshardingCoordinator would be responsible for doing this in its run() rather than being an automatic part of primary-only service rebuilding the Instance.

  • getReshardingCoordinatorObserver() is called while the storage transaction for the update is still open and blocks until the service is no longer in the kRebuilding state. These constraints lead to a guarantee that the initial coordinator document ReshardingCoordinator receives through its constructor when _rebuildInstances() is called is from a snapshot prior to the update.
  • This guarantee would no longer apply if ReshardingCoordinator read a separate ReshardingCoordinatorParticipantDocument to do recovery as part of its run(). But it also doesn't need that guarantee because the first thing it would be doing in run() is reading the latest version of the config.reshardingOperations.participants collection anyway.
Generated at Thu Feb 08 05:24:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.