[SERVER-73705] Design & Implement solution for potential race between online movePrimary commands Created: 07/Feb/23  Updated: 24/May/23  Resolved: 24/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Abdul Qadeer Assignee: Abdul Qadeer
Resolution: Won't Do Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-72492 Implement commands for movePrimary re... Closed
Participants:

 Description   

A race is possible between movePrimaryRecipientAbortMigration and movePrimaryRecipientSyncData such that the former is received first. Once the movePrimaryRecipientSyncData is finally received it will not do anything seeing the "aborted" state. Now if the command is received after the doc is deleted the recipient may end up restarting migration which is not intended.

A possible solution is to use retryable writes [suggested by Max] i.e. We send the delete of recipient state doc as a “retryable write”. It will be part of logical session for 30 minutes. When we receive movePrimaryRecipientSyncData, we check if a prior delete was executed.


Generated at Thu Feb 08 06:25:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.