Details
-
Bug
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
-
None
-
Fully Compatible
-
ALL
-
Repl 2017-10-02, Repl 2017-10-23
-
0
Description
There is a race in this test between the thread started in stepDown_nonBlocking() and the call to ReplicationCoordinator::stepDown():
|
replication_coordinator_impl_test.cpp |
|
2021
|
TEST_F(StepDownTest, OnlyOneStepDownCmdIsAllowedAtATime) {
|
2022
|
OpTime optime1(Timestamp(100, 1), 1);
|
2023
|
OpTime optime2(Timestamp(100, 2), 1);
|
2024
|
|
2025
|
// No secondary is caught up |
2026
|
auto repl = getReplCoord();
|
2027
|
repl->setMyLastAppliedOpTime(optime2);
|
2028
|
repl->setMyLastDurableOpTime(optime2);
|
2029
|
ASSERT_OK(repl->setLastAppliedOptime_forTest(1, 1, optime1));
|
2030
|
ASSERT_OK(repl->setLastAppliedOptime_forTest(1, 2, optime1));
|
2031
|
|
2032
|
simulateSuccessfulV1Election();
|
2033
|
|
2034
|
ASSERT_TRUE(getReplCoord()->getMemberState().primary());
|
2035
|
|
2036
|
// Step down where the secondary actually has to catch up before the stepDown can succeed. |
2037
|
// On entering the network, _stepDownContinue should cancel the heartbeats scheduled for |
2038
|
// T + 2 seconds and send out a new round of heartbeats immediately. |
2039
|
// This makes it unnecessary to advance the clock after entering the network to process |
2040
|
// the heartbeat requests. |
2041
|
auto result = stepDown_nonBlocking(false, Seconds(10), Seconds(60)); |
2042
|
|
2043
|
// Now while the first stepdown request is waiting for secondaries to catch up, attempt another |
2044
|
// stepdown request and ensure it fails. |
2045
|
const auto opCtx = makeOperationContext(); |
2046
|
auto status = getReplCoord()->stepDown(opCtx.get(), false, Seconds(10), Seconds(60)); |
2047
|
ASSERT_EQUALS(ErrorCodes::ConflictingOperationInProgress, status);
|
2048
|
|
2049
|
// Now ensure that the original stepdown command can still succeed. |
2050
|
catchUpSecondaries(optime2);
|
2051
|
|
2052
|
ASSERT_OK(*result.second.get());
|
2053
|
ASSERT_TRUE(repl->getMemberState().secondary());
|
2054
|
}
|
If the main test thread attempts to call stepDown() before the TopologyCoordinator enters the attempingToStepDown state, this test will block.
Attachments
Issue Links
- is related to
-
SERVER-28544 Stepdown command must take global lock in exclusive mode
-
- Closed
-
-
SERVER-31341 Synchronize unit tests that wait for asynchronous stepdown attempts
-
- Closed
-