Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-50937

Make resharding coordinator support recovery

    • Fully Compatible
    • v5.0
    • Sharding 2021-07-12, Sharding 2021-07-26
    • 19
    • 3

      Needs further investigation. Contact blake.oler before starting work on this.


      1. It's okay for onCommit handlers to be run out of order if they have attached opTimes.
      2. How do we ensure that the resharding coordinator is always using the latest version of the document to replace the contents on disk?
      3. Is ensuring that promises are fulfilled on recovery as simple as updating the in-memory document after checking the future of the first promise? Would it be simpler to do away completely with an in-memory representation of the underlying document?
      4. Need to make sure that we don't write to the temporary resharding collection entry when it should have already been removed.

      Out of date old description

      Introduce methods that can manually fulfill the promises in the ReshardingCoordinatorObserver for the recovery process.

      Create a flag for the ReshardingCoordinatorObserver, say _shouldObserveWrites, to prevent writes from fulfilling the observer's promises while in recovery. 

      When in recovery (when the ReshardingCoordinatorService is constructed in state > kInitializing), construct the ReshardingCoordinatorObserver with _shouldObserveWrites to false until the ReshardingCoordinator has fully recovered and it is safe for the ReshardingCoordinatorObserver to begin observing writes again. We will flip this flag to true as a part of the recovery process (described below). Note that this means writes to config.reshardingOperations can happen before the ReshardingCoordinatorService is constructed or after its constructed but before we've done recovery. This is okay, because the coordinator will read from disk as a part of recovery (described below).

      At the start of ReshardingCoordinatorService::run(), if the coordinator is recovering (its state is > kInitializing) do the following in order to fulfill any promises that would have been fulfilled already had we not failed over and observe any writes that came in before we started recovery:
      1. Take the collection lock in mode S
      2. Read config.reshardingOperations for this resharding op
      3. Inspect the doc and fulfill any promises that should be fulfilled already
      4. Flip ‘_shouldObserveWrites’ to be true
      5. Release the collection lock

            randolph@mongodb.com Randolph Tan
            haley.connelly@mongodb.com Haley Connelly
            0 Vote for this issue
            5 Start watching this issue