[SERVER-53027] Tenant migration donor should stop retrying on recipient responses that indicate the recipient wants to abort Created: 23/Nov/20 Updated: 24/Feb/21 Resolved: 24/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Andrew Shuvalov (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | pm-1791_milestone-D, pm-1791_non-cloud-blocking | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Sprint: | Sharding 2021-03-08 | ||||||||
| Participants: | |||||||||
| Description |
|
This ticket is to update the donor to stop retrying on recipient responses that indicate the recipient wants to abort the migration. Right now the donor will keep sending recipientSyncData until success (or getting a shutdown or stepdown error, but jason.zhang will replace those conditions with checking if the instance's cancelation token has been canceled |
| Comments |
| Comment by Andrew Shuvalov (Inactive) [ 24/Feb/21 ] |
|
As discussed above for now this change is not necessary. |
| Comment by Jack Mulrow [ 24/Feb/21 ] |
|
With Chou's changes, we already implemented the first option from Esha's first comment, right? Like she mentioned, we'd need to make sure the only errors the recipient returns that aren't in the retryable category are "fatal" to the migration, which seems like reasonable behavior to me. That should already be the case, and if not, the stepdown testing we'll add inĀ |
| Comment by Andrew Shuvalov (Inactive) [ 24/Feb/21 ] |
|
Hi, I have 2 questions: 2. The code above doesn't convert the error to TenantMigrationAborted as suggested in message #1. I checked the AsyncTry() implementation used in _sendCommandToRecipient() and it doesn't provide the hook to convert one type of error to another in the callback. So the trick with converting the error could complicated, I don't know how to do it right away but it might be not worth it. Will all of the above, do you thing there is still anything to do with this ticket? |
| Comment by Esha Maharishi (Inactive) [ 30/Nov/20 ] |
|
Oh yes, I forgot the recipient should do that because:
The only case we have right now where the donor can decide to abort is after the donor starts blocking, but what you said is probably a good idea in case that ever changes. |
| Comment by Lingzhi Deng [ 30/Nov/20 ] |
|
That makes sense. My current implementation for the recipientForgetMigration (still in CR though) would still write a state doc marking the migration garbage collectable and return success even if the recipient hasn't received a recipientSyncData command. So we probably dont need the NoSuchTenantMigration part. But having it is fine in case I change that in my CR. |
| Comment by Esha Maharishi (Inactive) [ 30/Nov/20 ] |
|
lingzhi.deng , option 2 sounds good to me, and yes, repl will need to guarantee the recipient always returns TenantMigrationAborted for recipientSyncData. I think for recipientForgetMigration, the donor should keep retrying until hearing back success or NoSuchTenantMigration. |
| Comment by Lingzhi Deng [ 30/Nov/20 ] |
|
I think doing the second option might be cleaner and more explicit. In that case, I guess we will probably need another ticket for Repl to audit the error code returned by recipientSyncData and recipientForgetMigration command? |
| Comment by Esha Maharishi (Inactive) [ 29/Nov/20 ] |
|
lingzhi.deng, heads up that I filed this new "cleanup" ticket. I see several options:
|