[SERVER-60969] ReplClientInfo::getLastOp may not be updated before being used to wait for replication in tenant migration Created: 25/Oct/21 Updated: 29/Oct/23 Resolved: 11/Nov/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 5.2.0, 5.1.1 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Wenbin Zhu | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v5.1, v5.0
|
||||||||||||||||||||
| Sprint: | Server Serverless 2021-11-01, Server Serverless 2021-11-15 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 167 | ||||||||||||||||||||
| Description |
|
waitUntilMigrationReachesReturnAfterReachingTimestamp updates the rejectReadsBeforeTimestamp in the recipient state doc, and then use ReplClientInfo::getLastOp() to get the latest opTime written by this client. This opTime is later being used to wait for replication. However in some cases this opTime may not be initialized at all, so the wait for replication will return immediately. This causes the recipientSyncData command to return without replicating the state doc updates to majority nodes, which is not correct because the donor might incorrectly think the migration is finished. One scenario for this to happen:
This pattern (getting opTime from ReplClientInfo::getLastOp() and wait on that opTime) is being used in multiple places in tenant migration, so this issue may happen elsewhere. |
| Comments |
| Comment by Githook User [ 16/Nov/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: (cherry picked from commit 65b6cd2fc68021388b992fad6d46fa349324f2a2) |
| Comment by Githook User [ 11/Nov/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Githook User [ 05/Nov/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: Revert " This reverts commit 0eda431672cedbbe62e3b487d2389b1f5c5695df. |
| Comment by Githook User [ 05/Nov/21 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Suganthi Mani [ 03/Nov/21 ] |
|
Here are *few updates*: I was wondering whether the fix needs to be applied to donor side as well. Thinking about it, made me to doubt the scenario mentioned in the problem description as it describes that the new recipient primary already has the state document w/ RejectReadsBeforeTimestamp set (which is non-majority committed), resulted in no-op write and returning the recipientSyncData command response w/o ensuring the state doc is majority committed and eventually leading to rollback of the RejectReadsBeforeTimestamp update . To my understanding, POS will start the recipient instance using the majority-committed state document. So, which means, there is no possibility that the recipientSyncData command response was returned before the state doc w/ RejectReadsBeforeTimestamp update was majority committed. Spoke With wenbin.zhu and we realized the scenario that caused BF failure is that (also aligns with BF-22917 logs) The problem is that waitUntilMigrationReachesReturnAfterReachingTimestamp() which updates state document, is not concurrent safe.
Having said that, this pattern is currently safer in all places of tenant migration code (both donor & recipient) except this place where we update the state doc for "RejectReadsBeforeTimestamp" because that's the only place we update the state doc outside the future chain. I believe, we initially wrote the tenant migration code with the assumption that the state doc gets modified only via the chain. The set "RejectReadsBeforeTimestamp" seems to be exceptional where the state doc gets updated in the command caller path. |
| Comment by Lingzhi Deng [ 25/Oct/21 ] |
|
FWIW, client writes use the LastOpFixer to handle noop writes. We may want to consider using that for internal writes in tenant migration. |
| Comment by Wenbin Zhu [ 25/Oct/21 ] |
|
cc lingzhi.deng |
| Comment by Wenbin Zhu [ 25/Oct/21 ] |
|
I did some hack to quickly repro this issue by setting the rejectReadsBeforeTimestamp directly to the returnAfterReachingTimestamp instead of setting to the result of selectRejectReadsBeforeTimestamp, which returns the maximum of finalRecipientWriteTimestamp and returnAfterReachingTimestamp. In reality returnAfterReachingTimestamp could be greater than finalRecipientWriteTimestamp, so the repro should be valid. |