-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: Replication
-
None
-
Fully Compatible
-
ALL
-
v5.1, v5.0
-
Server Serverless 2021-11-01, Server Serverless 2021-11-15
-
167
waitUntilMigrationReachesReturnAfterReachingTimestamp updates the rejectReadsBeforeTimestamp in the recipient state doc, and then use ReplClientInfo::getLastOp() to get the latest opTime written by this client. This opTime is later being used to wait for replication. However in some cases this opTime may not be initialized at all, so the wait for replication will return immediately. This causes the recipientSyncData command to return without replicating the state doc updates to majority nodes, which is not correct because the donor might incorrectly think the migration is finished.
One scenario for this to happen:
- Recipient reaches consistent state
- Donor sends recipientSyncData command with returnAfterReachingTimestamp to recipient
- Donor starts failing over and the new donor primary sends another recipientSyncData command with the same returnAfterReachingTimestamp
- The first recipientSyncData command updates the rejectReadsBeforeTimestamp in the state doc, but fails to wait for replication due to being stepped down
- The second recipientSyncData also starts updating the rejectReadsBeforeTimestamp, but since the state doc was already updated in memory with the same rejectReadsBeforeTimestamp, this will be a no-op and we do not write anything to the oplog. This means that the ReplClientInfo::getLastOp() will not be updated. So we end up waiting on an uninitialized opTime since no writes ever happened on this client, and the wait for replication will return immediately.
This pattern (getting opTime from ReplClientInfo::getLastOp() and wait on that opTime) is being used in multiple places in tenant migration, so this issue may happen elsewhere.
- related to
-
SERVER-61404 Set rejectReadsBeforeTimestamp only once
- Closed