[SERVER-51179] TenantMigrationDonorInstance should distinguish local and remote shutdown/stepdown errors Created: 28/Sep/20  Updated: 29/Oct/23  Resolved: 10/Dec/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.9.0

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Jason Zhang
Resolution: Fixed Votes: 0
Labels: pm-1791_milestone-D
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-50549 Transform connection-related error co... Closed
Related
is related to SERVER-51012 Integrate CancelationTokens with Prim... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2020-11-16, Sharding 2020-11-30, Sharding 2020-12-14
Participants:

 Description   

Currently, TenantMigrationDonorInstance stops retrying on shutdown/stepdown errors regardless of if the error is local or remote.

Once there is a way to distinguish local and remote errors, TenantMigrationDonorInstance should be updated to only stop retrying if the error is local.

 

Edit: We decided not to use the machinery being built for SERVER-50549. Instead, we can use the fact that an Instance has a private _isRunning bool that gets set to true when the instance's run() is called.

Right now the bool never gets set to false, but we can make PrimaryOnlyService::onStepdown and PrimaryOnlyService::shutdown iterate all instances in that service and set their _isRunning bools to false, similarly to how they iterate their instances and call interrupt on them. (This does mean PrimaryOnlyService::Instance will need a member mutex to protect _isRunning.)

We can add a protected `const isRunning()` method to PrimaryOnlyService::Instance that returns _isRunning.

Then, an instance's retry loop can just keep retrying until `status.isOK() || !isRunning()` instead of checking if the status is a NotPrimary or Shutdown error.



 Comments   
Comment by Githook User [ 10/Dec/20 ]

Author:

{'name': 'Jason Zhang', 'email': 'jason.zhang@mongodb.com', 'username': 'jz1242'}

Message: SERVER-51179 TenantMigrationDonorInstance should distinguish local and remote shutdown/stepdown errors
Branch: master
https://github.com/mongodb/mongo/commit/a8db11a72b4f5e838785019e030da05a4502a2ba

Comment by Esha Maharishi (Inactive) [ 01/Oct/20 ]

Actually, given matthew.saltz is going actively working on integrating cancel tokens with POS and things like AsyncTry, we may as well just wait until he's done, so putting it back on the backlog.

Comment by Esha Maharishi (Inactive) [ 30/Sep/20 ]

Sounds good, let's go with Spencer's modification. This aligns it better with cancel tokens, which should eventually replace interrupt().

Comment by Spencer Brody (Inactive) [ 30/Sep/20 ]

Rather than relying on the _running bool from the Instance base class, and have to add a mutex into the base class, I think we should just make Instance subclasses handle this themselves when their interrupt() method is called (or down the road, when their CancelToken is signaled)

Generated at Thu Feb 08 05:24:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.