[SERVER-34608] Drivers may still see ismaster=true from primary in midst of stepping down immediately after operations are killed with InterruptedDueToReplStateChange Created: 23/Apr/18  Updated: 27/Oct/23  Resolved: 07/Feb/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Backlog - Replication Team
Resolution: Works as Designed Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-57167 Prevent throwing on session creation ... Closed
related to SERVER-34666 Reduce the number of retries needed f... Backlog
is related to SERVER-74409 StreamableReplicaSetMonitor::getHosts... Closed
Assigned Teams:
Replication
Operating System: ALL
Sprint: Repl 2018-05-07
Participants:
Linked BF Score: 6

 Description   

ReplicationCoordinatorImpl::stepDown() calls ReplicationCoordinatorExternalStateImpl::killAllUserOperations() prior to calling TopologyCoordinator::prepareForStepDownAttempt().

auto globalLock = stdx::make_unique<Lock::GlobalLock>(
    opCtx, MODE_X, stepDownUntil, Lock::GlobalLock::EnqueueOnly());
 
// We've requested the global exclusive lock which will stop new operations from coming in,
// but existing operations could take a long time to finish, so kill all user operations
// to help us get the global lock faster.
_externalState->killAllUserOperations(opCtx);
 
...
 
status = _topCoord->prepareForStepDownAttempt();

The implications of the current behavior w.r.t. retryable writes are that server selection may choose to retry the write operation against the primary in the midst of stepping down. Since the global X lock is held for the duration of the primary's stepdown attempt, the retry attempt will be blocked on the server until ReplicationCoordinatorExternalStateImpl::closeConnections() (and thus ServiceEntryPoint::endAllSessions()) has been called. A driver would then see a network error but wouldn't retry the operation for yet another time because it has exhausted its one retry attempt quota.

For comparison: The reconnect() function in jstests/replsets/rslib.js works around this issue by retrying until it succeeds in running the "collStats" command because unlike the "isMaster" command, the "collStats" command requires acquiring the global lock and therefore must wait until the stepdown has finished.



 Comments   
Comment by Matthew Russotto [ 18/Mar/19 ]

We believe the fix in SERVER-38456 which will prevent killing retryable writes when trying to kill transactions will result in the workaround working again.

Comment by Matthew Russotto [ 15/Mar/19 ]

We rejected this, but I believe the RSTL and stepdown changes mean we should reconsider. The workaround of retrying twice no longer works because it is now possible for the retries to be killed (because we kill in a loop) any number of times.

I also don't think we need to take a lock. It is sufficient to immediately stop reporting isMaster before killing operations.

Comment by Spencer Brody (Inactive) [ 07/May/18 ]

Discussed this with drivers and we agreed that making isMaster take locks in any situation is probably a bad idea.  Instead drivers are going to work around this on their end - that work is tracked by SPEC-1072.

Comment by Spencer Brody (Inactive) [ 23/Apr/18 ]

This is by design and unavoidable. If drivers want to be able to wait until the stepdown process fully succeeds or fails, we could add a flag to isMaster which when set causes it to briefly take a Global IS lock and then release it, just to serialize the isMaster with ongoing stepdowns. Drivers could then include this flag only when reconnecting to a node that just returned InterruptedDueToReplStateChange. I'm not sure if that's worth doing or not though.

Generated at Thu Feb 08 04:37:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.