Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.0
Affects Version/s: None
Component/s: None
Labels:
- pm-1791_milestone-B

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Sprint:
Repl 2021-02-08, Repl 2021-02-22
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Currently, on the first recipientSyncData command (without returnAfterReachingDonorTimestamp), we wait on the dataConsistent future. On interrupts (due to stepdown, etc.) of the future chain, we override the error status with the interrupt status so that the donor is able to retry the recipientSyncData command.

However, the second recipientSyncData command (with returnAfterReachingDonorTimestamp) simply waits for an optime to be majority committed. We should do something similar where we override the error status to an interrupt when appropriate so that the donor is able to retry the recipientSyncData command without aborting the migration.

—

A second bug was found, which shows up after this issue is resolved. Once the donor retries, the second recipientSyncData command is sent to the new primary of the recipient RST. We may be in a state where the tenant oplog applier exists, but hasn't been started yet. This bug manifests as an invariant failure here after being called here.
The fix is to wait on the _dataConsistentPromise instead of the _dataSyncStartedPromise.

Assignee:: Vishnu Kaushik
Reporter:: Jason Chan
Participants:: Githook User, Jason Chan, Vishnu Kaushik
Votes:: 0 Vote for this issue
Watchers:: 3 Start watching this issue

Created:: Jan 20 2021 06:55:15 PM UTC
Updated:: Oct 29 2023 09:58:36 PM UTC
Resolved:: Feb 18 2021 03:01:26 PM UTC
Confidence Status Last Update:: 01/Feb/21 8:18 PM

Details

Description

Attachments

Activity

People

Dates