[SERVER-51179] TenantMigrationDonorInstance should distinguish local and remote shutdown/stepdown errors Created: 28/Sep/20 Updated: 29/Oct/23 Resolved: 10/Dec/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Esha Maharishi (Inactive) | Assignee: | Jason Zhang |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | pm-1791_milestone-D | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Currently, TenantMigrationDonorInstance stops retrying on shutdown/stepdown errors regardless of if the error is local or remote. Once there is a way to distinguish local and remote errors, TenantMigrationDonorInstance should be updated to only stop retrying if the error is local.
Edit: We decided not to use the machinery being built for Right now the bool never gets set to false, but we can make PrimaryOnlyService::onStepdown and PrimaryOnlyService::shutdown iterate all instances in that service and set their _isRunning bools to false, similarly to how they iterate their instances and call interrupt on them. (This does mean PrimaryOnlyService::Instance will need a member mutex to protect _isRunning.) We can add a protected `const isRunning()` method to PrimaryOnlyService::Instance that returns _isRunning. Then, an instance's retry loop can just keep retrying until `status.isOK() || !isRunning()` instead of checking if the status is a NotPrimary or Shutdown error. |
| Comments |
| Comment by Githook User [ 10/Dec/20 ] |
|
Author: {'name': 'Jason Zhang', 'email': 'jason.zhang@mongodb.com', 'username': 'jz1242'}Message: |
| Comment by Esha Maharishi (Inactive) [ 01/Oct/20 ] |
|
Actually, given matthew.saltz is going actively working on integrating cancel tokens with POS and things like AsyncTry, we may as well just wait until he's done, so putting it back on the backlog. |
| Comment by Esha Maharishi (Inactive) [ 30/Sep/20 ] |
|
Sounds good, let's go with Spencer's modification. This aligns it better with cancel tokens, which should eventually replace interrupt(). |
| Comment by Spencer Brody (Inactive) [ 30/Sep/20 ] |
|
Rather than relying on the _running bool from the Instance base class, and have to add a mutex into the base class, I think we should just make Instance subclasses handle this themselves when their interrupt() method is called (or down the road, when their CancelToken is signaled) |