[SERVER-46381] Test concurrent reconfig and stepdown Created: 24/Feb/20  Updated: 29/Oct/23  Resolved: 17/Mar/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.7.0

Type: Task Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Related
is related to SERVER-47758 HBStepdownAndReconfigTest unit tests ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2020-03-09, Repl 2020-03-23
Participants:
Linked BF Score: 0

 Description   

The safe reconfig protocol ensures only one reconfig can happen at a time (on a given node), and reconfig cancels elections. However, a node that has learned about a new config via heartbeat and is beginning to process it could suddenly start to step down. Let's investigate whether there are concurrency issues in these scenarios.

  1. node has started to step down and suddenly learns about a new config via heartbeat
  2. a node has learned about a new config via heartbeat, but suddenly starts to step down
  3. a node has started to step down and suddenly a replSetReconfig command arrives from the user
  4. a replSetReconfig command arrives from the user and suddenly the node starts to step down


 Comments   
Comment by Githook User [ 18/May/20 ]

Author:

{'name': 'Siyuan Zhou', 'email': 'visualzhou@gmail.com', 'username': 'visualzhou'}

Message: SERVER-48257 Backport handleHeartbeatResponse_forTest to 4.4

The test helper is introduced in SERVER-46381 and updated by SERVER-48115.
Branch: v4.4
https://github.com/mongodb/mongo/commit/f5f77b5f38efa5c7254593d89e43ac23cd4fac0f

Comment by Siyuan Zhou [ 25/Mar/20 ]

We decided not to backport this test to 4.4, since it didn't uncover new bugs and the reconfig passthrough test suites in SERVER-45094 will give the same test coverage.

Comment by Githook User [ 17/Mar/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-46381 Test concurrent stepdown and reconfig
Branch: master
https://github.com/mongodb/mongo/commit/8c1515929f34d41dbefbb9476e1dd893d523ad01

Comment by Githook User [ 15/Mar/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: Revert "SERVER-46381 Test concurrent stepdown and reconfig"

This reverts commit 5b7782502396354468815ff56150be789599919a.
Branch: master
https://github.com/mongodb/mongo/commit/ed023e8734948ebf89e87d7aa7ff4f1cbf4332fa

Comment by A. Jesse Jiryu Davis [ 14/Mar/20 ]

Once SERVER-45096 is done, I can see if it's correct to remove a try/catch around this test's call to processReplSetReconfig.

Comment by Githook User [ 14/Mar/20 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: SERVER-46381 Test concurrent stepdown and reconfig
Branch: master
https://github.com/mongodb/mongo/commit/5b7782502396354468815ff56150be789599919a

Comment by Githook User [ 11/Mar/20 ]

Author:

{'username': 'ldennis', 'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com'}

Message: Revert "SERVER-46381 Test concurrent reconfig and stepdown"

This reverts commit bdf61762f8fd755b784b55af8457f8fcdd7fe068.
Branch: master
https://github.com/mongodb/mongo/commit/521ab081fdbd6a0ba5169f762a78e2158edba86c

Comment by Githook User [ 11/Mar/20 ]

Author:

{'username': 'ajdavis', 'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com'}

Message: SERVER-46381 Test concurrent reconfig and stepdown
Branch: master
https://github.com/mongodb/mongo/commit/bdf61762f8fd755b784b55af8457f8fcdd7fe068

Comment by A. Jesse Jiryu Davis [ 25/Feb/20 ]

Scenario 4.

Thread A is in ReplicationCoordinatorImpl::processReplSetReconfig, holding the replication coordinator mutex. It checks if self is primary (which it is), drops the mutex, stores the new config document, waits for the new config to propagate to a majority, and calls _performPostMemberStateUpdateAction, which I think must be kActionNone.

Thread B is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex.

If the stepdown starts while the replSetReconfig command waits for a majority to replicate the new config, the command will return with an error. The reconfig may or may not eventually be committed. Some node in the set will run for election and win, and it may or may not have the new config. This non-determinism is by design.

We can test the case where waiting for config commitment is interrupted by stepdown if we disconnect a quorum using mongobridge. That test can assert that some member is elected after the quorum is reconnected. It's acceptable for the primary to have the old or new config.

Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ]

Scenario 3.

Thread A is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex.

Thread B is in ReplicationCoordinatorImpl::processReplSetReconfig, holding the replication coordinator mutex. It checks if self is primary (which it is), drops the mutex, stores the new config document, and calls _performPostMemberStateUpdateAction (which cannot be stepdownSelf).

A worker thread enters _stepDownFinish, takes the RSTL and replcoord mutex, and completes stepdown by calling _performPostMemberStateUpdateAction.

We can test this sequence by enabling the blockHeartbeatStepdown failpoint until Thread B has finished.

Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ]

Scenario 2 swaps the identities of Threads A and B, but it appears to have the same possible behaviors as Scenario 1.

Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ]

Scenario 1.

Thread A is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex.

Thread B is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a newer config and schedules a call to _heartbeatReconfigStore, then drops the mutex.

Then the following two steps happen in some order:

1. A worker thread enters _stepDownFinish, takes the RSTL and replcoord mutex, and completes stepdown by calling _performPostMemberStateUpdateAction.

2. Another worker thread enters _heartbeatReconfigStore, which stores the local config, takes the mutex, and calls _heartbeatReconfigFinish. This cannot cause a stepdown (safe reconfig requires the primary remains electable). The thread drops the mutex and calls _performPostMemberStateUpdateAction.

We can test ordering (1, 2) by introducing a failpoint blockHeartbeatReconfigStore at the top of _heartbeatReconfigStore. We can test ordering (2, 1) with the existing failpoint blockHeartbeatStepdown.

Generated at Thu Feb 08 05:11:19 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.