Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53926

Tenant migration recipient should replace recipientSyncData (with returnAfterTimestamp) errors with interrupt status when appropriate

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.0
    • Affects Version/s: None
    • Component/s: None
    • Fully Compatible
    • ALL
    • Repl 2021-02-08, Repl 2021-02-22

      Currently, on the first recipientSyncData command (without returnAfterReachingDonorTimestamp), we wait on the dataConsistent future. On interrupts (due to stepdown, etc.) of the future chain, we override the error status with the interrupt status so that the donor is able to retry the recipientSyncData command.

      However, the second recipientSyncData command (with returnAfterReachingDonorTimestamp) simply waits for an optime to be majority committed. We should do something similar where we override the error status to an interrupt when appropriate so that the donor is able to retry the recipientSyncData command without aborting the migration.

      A second bug was found, which shows up after this issue is resolved. Once the donor retries, the second recipientSyncData command is sent to the new primary of the recipient RST. We may be in a state where the tenant oplog applier exists, but hasn't been started yet. This bug manifests as an invariant failure here after being called here.
      The fix is to wait on the _dataConsistentPromise instead of the _dataSyncStartedPromise.

            vishnu.kaushik@mongodb.com Vishnu Kaushik
            jason.chan@mongodb.com Jason Chan
            0 Vote for this issue
            3 Start watching this issue