[SERVER-62293] Race between recipientForgetMigration cmd and TenantMigrationRecipientService future chain restart on errors machinery. Created: 28/Dec/21  Updated: 29/Oct/23  Resolved: 19/Jan/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.3.0, 5.2.2

Type: Bug Priority: Major - P3
Reporter: Suganthi Mani Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.2, v5.1
Sprint: Server Serverless 2022-01-24
Participants:
Linked BF Score: 126

 Description   

recipientForgetMigration cmd before waiting for the task completion promise, it will try to interrupt the TenantMigrationRecipient instance. But if it is already in the interrupted state, recipientForgetMigration cmd will skip interrupt and starts to wait for the task completion promise to get fulfilled. However, if the original interrupt status is retryable error code, we would reset the task state to "running", clear interrupt status & restart the TenantMigrationRecipientService future chain. As a result of restart, the task completion promise for that recipient instance won't get fulfilled (unless the node steps down or shuts down or receives another recipientForgetMigration cmd), leading to the recipientForgetMigration cmd hang.



 Comments   
Comment by Githook User [ 16/Feb/22 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-62293 Fix race between recipientForgetMigration cmd and TenantMigrationRecipientService future chain restart on errors machinery.

(cherry picked from commit e3cf73fe6b96476518f7ab7c1dfb36f10597589b)
Branch: v5.2
https://github.com/mongodb/mongo/commit/6c93be45cd4869ab0a0f4d8132dee4fd8571dcb8

Comment by Githook User [ 19/Jan/22 ]

Author:

{'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}

Message: SERVER-62293 Fix race between recipientForgetMigration cmd and TenantMigrationRecipientService future chain restart on errors machinery.
Branch: master
https://github.com/mongodb/mongo/commit/e3cf73fe6b96476518f7ab7c1dfb36f10597589b

Comment by A. Jesse Jiryu Davis [ 11/Jan/22 ]

Then I'll assign it to you, suganthi.mani, please do it on the next BF Friday.

Comment by Suganthi Mani [ 11/Jan/22 ]

jesse pretty small fix, 1-2 lines of server code changes (+ add a test ???)

Comment by A. Jesse Jiryu Davis [ 11/Jan/22 ]

suganthi.mani can you estimate the time to implement your proposed fix, please?

Comment by Esha Maharishi (Inactive) [ 10/Jan/22 ]

suganthi.mani to triage with jesse and christopher.caplinger against other Shard Merge work tomorrow.

Comment by Suganthi Mani [ 29/Dec/21 ]

Proposed Fix:
My proposal is that we shouldn't retry the TenantMigrationRecipientService future chain if the
_receivedRecipientForgetMigrationPromise is fulfilled(ready), even if the original interrupt status is a retryable error.

Generated at Thu Feb 08 05:54:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.