[SERVER-46381] Test concurrent reconfig and stepdown Created: 24/Feb/20 Updated: 29/Oct/23 Resolved: 17/Mar/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | A. Jesse Jiryu Davis | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Repl 2020-03-09, Repl 2020-03-23 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
The safe reconfig protocol ensures only one reconfig can happen at a time (on a given node), and reconfig cancels elections. However, a node that has learned about a new config via heartbeat and is beginning to process it could suddenly start to step down. Let's investigate whether there are concurrency issues in these scenarios.
|
| Comments |
| Comment by Githook User [ 18/May/20 ] |
|
Author: {'name': 'Siyuan Zhou', 'email': 'visualzhou@gmail.com', 'username': 'visualzhou'}Message: The test helper is introduced in |
| Comment by Siyuan Zhou [ 25/Mar/20 ] |
|
We decided not to backport this test to 4.4, since it didn't uncover new bugs and the reconfig passthrough test suites in |
| Comment by Githook User [ 17/Mar/20 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}Message: |
| Comment by Githook User [ 15/Mar/20 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: Revert " This reverts commit 5b7782502396354468815ff56150be789599919a. |
| Comment by A. Jesse Jiryu Davis [ 14/Mar/20 ] |
|
Once |
| Comment by Githook User [ 14/Mar/20 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: |
| Comment by Githook User [ 11/Mar/20 ] |
|
Author: {'username': 'ldennis', 'name': 'Lingzhi Deng', 'email': 'lingzhi.deng@mongodb.com'}Message: Revert " This reverts commit bdf61762f8fd755b784b55af8457f8fcdd7fe068. |
| Comment by Githook User [ 11/Mar/20 ] |
|
Author: {'username': 'ajdavis', 'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com'}Message: |
| Comment by A. Jesse Jiryu Davis [ 25/Feb/20 ] |
|
Scenario 4. Thread A is in ReplicationCoordinatorImpl::processReplSetReconfig, holding the replication coordinator mutex. It checks if self is primary (which it is), drops the mutex, stores the new config document, waits for the new config to propagate to a majority, and calls _performPostMemberStateUpdateAction, which I think must be kActionNone. Thread B is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex. If the stepdown starts while the replSetReconfig command waits for a majority to replicate the new config, the command will return with an error. The reconfig may or may not eventually be committed. Some node in the set will run for election and win, and it may or may not have the new config. This non-determinism is by design. We can test the case where waiting for config commitment is interrupted by stepdown if we disconnect a quorum using mongobridge. That test can assert that some member is elected after the quorum is reconnected. It's acceptable for the primary to have the old or new config. |
| Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ] |
|
Scenario 3. Thread A is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex. Thread B is in ReplicationCoordinatorImpl::processReplSetReconfig, holding the replication coordinator mutex. It checks if self is primary (which it is), drops the mutex, stores the new config document, and calls _performPostMemberStateUpdateAction (which cannot be stepdownSelf). A worker thread enters _stepDownFinish, takes the RSTL and replcoord mutex, and completes stepdown by calling _performPostMemberStateUpdateAction. We can test this sequence by enabling the blockHeartbeatStepdown failpoint until Thread B has finished. |
| Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ] |
|
Scenario 2 swaps the identities of Threads A and B, but it appears to have the same possible behaviors as Scenario 1. |
| Comment by A. Jesse Jiryu Davis [ 24/Feb/20 ] |
|
Scenario 1. Thread A is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a higher term in the response and calls TopologyCoordinator::prepareForUnconditionalStepDown(), schedules a call to _stepDownFinish, then drops the mutex. Thread B is in ReplicationCoordinatorImpl::_handleHeartbeatResponse, holding the replication coordinator mutex. It sees a newer config and schedules a call to _heartbeatReconfigStore, then drops the mutex. Then the following two steps happen in some order: 1. A worker thread enters _stepDownFinish, takes the RSTL and replcoord mutex, and completes stepdown by calling _performPostMemberStateUpdateAction. 2. Another worker thread enters _heartbeatReconfigStore, which stores the local config, takes the mutex, and calls _heartbeatReconfigFinish. This cannot cause a stepdown (safe reconfig requires the primary remains electable). The thread drops the mutex and calls _performPostMemberStateUpdateAction. We can test ordering (1, 2) by introducing a failpoint blockHeartbeatReconfigStore at the top of _heartbeatReconfigStore. We can test ordering (2, 1) with the existing failpoint blockHeartbeatStepdown. |