[SERVER-62293] Race between recipientForgetMigration cmd and TenantMigrationRecipientService future chain restart on errors machinery. Created: 28/Dec/21 Updated: 29/Oct/23 Resolved: 19/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0, 5.2.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Suganthi Mani |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v5.2, v5.1
|
||||||||
| Sprint: | Server Serverless 2022-01-24 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 126 | ||||||||
| Description |
|
recipientForgetMigration cmd before waiting for the task completion promise, it will try to interrupt the TenantMigrationRecipient instance. But if it is already in the interrupted state, recipientForgetMigration cmd will skip interrupt and starts to wait for the task completion promise to get fulfilled. However, if the original interrupt status is retryable error code, we would reset the task state to "running", clear interrupt status & restart the TenantMigrationRecipientService future chain. As a result of restart, the task completion promise for that recipient instance won't get fulfilled (unless the node steps down or shuts down or receives another recipientForgetMigration cmd), leading to the recipientForgetMigration cmd hang. |
| Comments |
| Comment by Githook User [ 16/Feb/22 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: (cherry picked from commit e3cf73fe6b96476518f7ab7c1dfb36f10597589b) |
| Comment by Githook User [ 19/Jan/22 ] |
|
Author: {'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com', 'username': 'smani87'}Message: |
| Comment by A. Jesse Jiryu Davis [ 11/Jan/22 ] |
|
Then I'll assign it to you, suganthi.mani, please do it on the next BF Friday. |
| Comment by Suganthi Mani [ 11/Jan/22 ] |
|
jesse pretty small fix, 1-2 lines of server code changes (+ add a test ???) |
| Comment by A. Jesse Jiryu Davis [ 11/Jan/22 ] |
|
suganthi.mani can you estimate the time to implement your proposed fix, please? |
| Comment by Esha Maharishi (Inactive) [ 10/Jan/22 ] |
|
suganthi.mani to triage with jesse and christopher.caplinger against other Shard Merge work tomorrow. |
| Comment by Suganthi Mani [ 29/Dec/21 ] |
|
Proposed Fix: |