[SERVER-47612] Elections not robust in remove_newly_added_field_after_finishing_initial_sync.js Created: 17/Apr/20  Updated: 29/Oct/23  Resolved: 07/May/20

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.4.1, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Vesselina Ratcheva (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-48125 Stepdown can deadlock with storing la... Closed
Related
related to SERVER-48256 Election cannot be canceled when writ... Closed
is related to SERVER-34682 Old primary should vote yes and store... Closed
is related to SERVER-48778 Make reconstruct_prepared_transaction... Closed
is related to SERVER-46353 Remove 'newlyAdded' field from ‘replS... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4
Sprint: Repl 2020-05-18
Participants:
Linked BF Score: 18

 Description   

The test exercises several step-ups as a way to test voter eligibility of 'newlyAdded' members. Unfortunately, some of those replSetStepUp's are prone to failure. We should investigate those and make appropriate fixes.



 Comments   
Comment by Githook User [ 20/Aug/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-47612 make storing the LastVote uninterruptible

(cherry picked from commit deebdeecd08b5e0063fa4fd275752eebe4c27bda)
Branch: v4.4
https://github.com/mongodb/mongo/commit/1467ca5f45ced609029ab44b56594ca50c5e9277

Comment by Githook User [ 07/May/20 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-47612 make storing the LastVote uninterruptible
Branch: master
https://github.com/mongodb/mongo/commit/deebdeecd08b5e0063fa4fd275752eebe4c27bda

Comment by Judah Schvimer [ 17/Apr/20 ]

I'm surprised that in remove_newly_added_field_after_finishing_initial_sync.js the lastVote gets stored before the stepdown stops killing operations. In primary_casts_vote_on_stepdown.js, the step down and processing the vote request happen on the same thread, but in remove_newly_added_field_after_finishing_initial_sync.js they happen on different threads. I wonder what's triggering the stepdown from a different thread:

[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.210+0000 d20020| 2020-04-17T04:59:09.209+00:00 I  ELECTION 23980   [conn4] "Responding to vote request","attr":{"request":"{ replSetRequestVotes: 1, setName: \"primary_casts_vote_on_stepdown\", dryRun: true, term: 1, candidateIndex: 1, configVersion: 2, configTerm: 1, lastAppliedOpTime: { ts: Timestamp(1587099548, 10), t: 1 } }","response":"{ term: 1, voteGranted: true, reason: \"\" }","replicaSetStatus":"Current replSetGetStatus output: { set: \"primary_casts_vote_on_stepdown\", date: new Date(1587099549209), myState: 1, term: 1, syncSourceHost: \"\", syncSourceId: -1, heartbeatIntervalMillis: 2000, majorityVoteCount: 2, writeMajorityCount: 2, votingMembersCount: 2, writableVotingMembersCount: 2, optimes: { lastCommittedOpTime: { ts: Timestamp(1587099548, 10), t: 1 }, lastCommittedWallTime: new Date(1587099548899), appliedOpTime: { ts: Timestamp(1587099548, 10), t: 1 }, durableOpTime: { ts: Timestamp(1587099548, 8), t: 1 }, lastAppliedWallTime: new Date(1587099548899), lastDurableWallTime: new Date(1587099548648) }, members: [ { _id: 0, name: \"ip-10-122-19-61:20020\", health: 1.0, state: 1, stateStr: \"PRIMARY\", uptime: 0, optime: { ts: Timestamp(1587099548, 10), t: 1 }, optimeDate: new Date(1587099548000), syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"Could not find member to sync from\", electionTime: Timestamp(1587099548, 2), electionDate: new Date(1587099548000), configVersion: 2, configTerm: 1, self: true, lastHeartbeatMessage: \"\" }, { _id: 1, name: \"ip-10-122-19-61:20021\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 0, optime: { ts: Timestamp(1587099548, 10), t: 1 }, optimeDurable: { ts: Timestamp(0, 0), t: -1 }, optimeDate: new Date(1587099548000), optimeDurableDate: new Date(0), lastHeartbeat: new Date(1587099549082), lastHeartbeatRecv: new Date(1587099549199), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"ip-10-122-19-61:20020\", syncSourceId: 0, infoMessage: \"\", configVersion: 2, configTerm: 1 } ] }"}
...
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.210+0000 d20020| 2020-04-17T04:59:09.210+00:00 I  REPL     21402   [conn4] "Stepping down from primary, because a new term has begun","attr":{"term":2}
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.210+00:00 I  REPL     21343   [RstlKillOpThread] "Starting to kill user operations"
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.210+00:00 I  REPL     21344   [RstlKillOpThread] "Stopped killing user operations"
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.210+00:00 I  REPL     21340   [RstlKillOpThread] "State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":0,"userOpsRunning":3}}
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.211+00:00 I  REPL     21358   [ReplCoord-0] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.211+00:00 D4 ELECTION 4615651 [ReplCoord-0] "Scheduling election timeout callback","attr":{"when":{"$date":"2020-04-17T04:59:20.615Z"}}
[js_test:primary_casts_vote_on_stepdown] 2020-04-17T04:59:09.211+0000 d20020| 2020-04-17T04:59:09.211+00:00 I  ELECTION 23980   [conn4] "Responding to vote request","attr":{"request":"{ replSetRequestVotes: 1, setName: \"primary_casts_vote_on_stepdown\", dryRun: false, term: 2, candidateIndex: 1, configVersion: 2, configTerm: 1, lastAppliedOpTime: { ts: Timestamp(1587099548, 10), t: 1 } }","response":"{ term: 2, voteGranted: true, reason: \"\" }","replicaSetStatus":"Current replSetGetStatus output: { set: \"primary_casts_vote_on_stepdown\", date: new Date(1587099549211), myState: 2, term: 2, syncSourceHost: \"\", syncSourceId: -1, heartbeatIntervalMillis: 2000, majorityVoteCount: 2, writeMajorityCount: 2, votingMembersCount: 2, writableVotingMembersCount: 2, optimes: { lastCommittedOpTime: { ts: Timestamp(1587099548, 10), t: 1 }, lastCommittedWallTime: new Date(1587099548899), appliedOpTime: { ts: Timestamp(1587099548, 10), t: 1 }, durableOpTime: { ts: Timestamp(1587099548, 10), t: 1 }, lastAppliedWallTime: new Date(1587099548899), lastDurableWallTime: new Date(1587099548899) }, members: [ { _id: 0, name: \"ip-10-122-19-61:20020\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 0, optime: { ts: Timestamp(1587099548, 10), t: 1 }, optimeDate: new Date(1587099548000), syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"Could not find member to sync from\", configVersion: 2, configTerm: 1, self: true, lastHeartbeatMessage: \"\" }, { _id: 1, name: \"ip-10-122-19-61:20021\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 0, optime: { ts: Timestamp(1587099548, 10), t: 1 }, optimeDurable: { ts: Timestamp(0, 0), t: -1 }, optimeDate: new Date(1587099548000), optimeDurableDate: new Date(0), lastHeartbeat: new Date(1587099549082), lastHeartbeatRecv: new Date(1587099549199), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"ip-10-122-19-61:20020\", syncSourceId: 0, infoMessage: \"\", configVersion: 2, configTerm: 1 } ] }"}

vs.

[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.370+0000 d20030| 2020-04-15T01:33:38.369+00:00 I  ELECTION 23980   [conn230] "Responding to vote request","attr":{"request":"{ replSetRequestVotes: 1, setName: \"remove_newly_added_field_after_finishing_initial_sync\", dryRun: true, term: 5, candidateIndex: 1, configVersion: 7, configTerm: 5, lastAppliedOpTime: { ts: Timestamp(1586914418, 1), t: 5 } }","response":"{ term: 5, voteGranted: true, reason: \"\" }","replicaSetStatus":"Current replSetGetStatus output: { set: \"remove_newly_added_field_after_finishing_initial_sync\", date: new Date(1586914418369), myState: 1, term: 5, syncSourceHost: \"\", syncSourceId: -1, heartbeatIntervalMillis: 2000, majorityVoteCount: 3, writeMajorityCount: 3, votingMembersCount: 4, writableVotingMembersCount: 4, optimes: { lastCommittedOpTime: { ts: Timestamp(1586914418, 1), t: 5 }, lastCommittedWallTime: new Date(1586914418364), appliedOpTime: { ts: Timestamp(1586914418, 1), t: 5 }, durableOpTime: { ts: Timestamp(1586914409, 3), t: 5 }, lastAppliedWallTime: new Date(1586914418364), lastDurableWallTime: new Date(1586914409638) }, members: [ { _id: 0, name: \"ip-10-122-66-210:20020\", health: 1.0, state: 1, stateStr: \"PRIMARY\", uptime: 0, optime: { ts: Timestamp(1586914418, 1), t: 5 }, optimeDate: new Date(1586914418000), syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"syncing from primary: ip-10-122-66-210:20021\", electionTime: Timestamp(1586914409, 1), electionDate: new Date(1586914409000), configVersion: 7, configTerm: 5, self: true, lastHeartbeatMessage: \"\" }, { _id: 1, name: \"ip-10-122-66-210:20021\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 28, optime: { ts: Timestamp(1586914409, 3), t: 5 }, optimeDurable: { ts: Timestamp(1586914409, 3), t: 5 }, optimeDate: new Date(1586914409000), optimeDurableDate: new Date(1586914409000), lastHeartbeat: new Date(1586914417739), lastHeartbeatRecv: new Date(1586914417971), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"ip-10-122-66-210:20020\", syncSourceId: 0, infoMessage: \"\", configVersion: 7, configTerm: 5 }, { _id: 2, name: \"ip-10-122-66-210:20022\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 6, optime: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDurable: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDate: new Date(1586914401000), optimeDurableDate: new Date(1586914401000), lastHeartbeat: new Date(1586914417812), lastHeartbeatRecv: new Date(1586914418012), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"\", configVersion: 7, configTerm: 5 }, { _id: 3, name: \"ip-10-122-66-210:20033\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 6, optime: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDurable: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDate: new Date(1586914401000), optimeDurableDate: new Date(1586914401000), lastHeartbeat: new Date(1586914417738), lastHeartbeatRecv: new Date(1586914417896), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"\", configVersion: 7, configTerm: 5 } ] }"}
...
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.371+0000 d20030| 2020-04-15T01:33:38.371+00:00 I  REPL     21402   [conn22] "Stepping down from primary, because a new term has begun","attr":{"term":6}
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.371+0000 d20030| 2020-04-15T01:33:38.371+00:00 I  ELECTION 23980   [conn230] "Responding to vote request","attr":{"request":"{ replSetRequestVotes: 1, setName: \"remove_newly_added_field_after_finishing_initial_sync\", dryRun: false, term: 6, candidateIndex: 1, configVersion: 7, configTerm: 5, lastAppliedOpTime: { ts: Timestamp(1586914418, 1), t: 5 } }","response":"{ term: 5, voteGranted: true, reason: \"\" }","replicaSetStatus":"Current replSetGetStatus output: { set: \"remove_newly_added_field_after_finishing_initial_sync\", date: new Date(1586914418371), myState: 1, term: 5, syncSourceHost: \"\", syncSourceId: -1, heartbeatIntervalMillis: 2000, majorityVoteCount: 3, writeMajorityCount: 3, votingMembersCount: 4, writableVotingMembersCount: 4, optimes: { lastCommittedOpTime: { ts: Timestamp(1586914418, 1), t: 5 }, lastCommittedWallTime: new Date(1586914418364), appliedOpTime: { ts: Timestamp(1586914418, 1), t: 5 }, durableOpTime: { ts: Timestamp(1586914409, 3), t: 5 }, lastAppliedWallTime: new Date(1586914418364), lastDurableWallTime: new Date(1586914409638) }, members: [ { _id: 0, name: \"ip-10-122-66-210:20020\", health: 1.0, state: 1, stateStr: \"PRIMARY\", uptime: 0, optime: { ts: Timestamp(1586914418, 1), t: 5 }, optimeDate: new Date(1586914418000), syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"syncing from primary: ip-10-122-66-210:20021\", electionTime: Timestamp(1586914409, 1), electionDate: new Date(1586914409000), configVersion: 7, configTerm: 5, self: true, lastHeartbeatMessage: \"\" }, { _id: 1, name: \"ip-10-122-66-210:20021\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 28, optime: { ts: Timestamp(1586914409, 3), t: 5 }, optimeDurable: { ts: Timestamp(1586914409, 3), t: 5 }, optimeDate: new Date(1586914409000), optimeDurableDate: new Date(1586914409000), lastHeartbeat: new Date(1586914417739), lastHeartbeatRecv: new Date(1586914417971), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"ip-10-122-66-210:20020\", syncSourceId: 0, infoMessage: \"\", configVersion: 7, configTerm: 5 }, { _id: 2, name: \"ip-10-122-66-210:20022\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 6, optime: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDurable: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDate: new Date(1586914401000), optimeDurableDate: new Date(1586914401000), lastHeartbeat: new Date(1586914417812), lastHeartbeatRecv: new Date(1586914418012), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"\", configVersion: 7, configTerm: 5 }, { _id: 3, name: \"ip-10-122-66-210:20033\", health: 1.0, state: 2, stateStr: \"SECONDARY\", uptime: 6, optime: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDurable: { ts: Timestamp(1586914401, 1), t: 3 }, optimeDate: new Date(1586914401000), optimeDurableDate: new Date(1586914401000), lastHeartbeat: new Date(1586914417738), lastHeartbeatRecv: new Date(1586914417896), pingMs: 0, lastHeartbeatMessage: \"\", syncSourceHost: \"\", syncSourceId: -1, infoMessage: \"\", configVersion: 7, configTerm: 5 } ] }"}
...
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.371+0000 d20030| 2020-04-15T01:33:38.371+00:00 I  REPL     21343   [RstlKillOpThread] "Starting to kill user operations"
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.371+0000 d20030| 2020-04-15T01:33:38.371+00:00 E  REPL     21428   [conn230] "replSetRequestVotes failed to store LastVote document","attr":{"error":{"code":11602,"codeName":"InterruptedDueToReplStateChange","errmsg":"operation was interrupted"}}
...
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.372+0000 d20030| 2020-04-15T01:33:38.372+00:00 I  REPL     21344   [RstlKillOpThread] "Stopped killing user operations"
...
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.372+0000 d20030| 2020-04-15T01:33:38.372+00:00 I  REPL     21340   [RstlKillOpThread] "State transition ops metrics","attr":{"metrics":{"lastStateTransition":"stepDown","userOpsKilled":1,"userOpsRunning":5}}
...
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.372+0000 d20030| 2020-04-15T01:33:38.372+00:00 I  REPL     21358   [ReplCoord-5] "Replica set state transition","attr":{"newState":"SECONDARY","oldState":"PRIMARY"}
[js_test:remove_newly_added_field_after_finishing_initial_sync] 2020-04-15T01:33:38.372+0000 d20030| 2020-04-15T01:33:38.372+00:00 D4 ELECTION 4615651 [ReplCoord-5] "Scheduling election timeout callback","attr":{"when":{"$date":"2020-04-16T04:37:44.202Z"}}

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

In my local, passing runs of primary_casts_vote_on_stepdown.js, we are taking this branch, which seems to align with the test passing since we don't acquire any locks. The isOplogTruncateAfterPointBeingUsedForPrimary method is supposed to return whether we are currently primary or not, and it's updated whenever we step down.

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

The stack trace above seems to partially answer my question. The code path for a recovery unit on an ephemeral storage engine seems to lead us to the ReplicationConsistencyMarkersImpl::refreshOplogTruncateAfterPointIfPrimary call, which executes the lock acquisition that gets interrupted. Presumably we don't do the same for durable recovery units. This might explain why we aren't seeing the failures in other test suites. On the other hand, this explanation could be wrong since we don't see the primary_casts_vote_on_stepdown.js test failing on inMemory.

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

On at least one of the failures Vessy posted, we are running with the inMemory storage engine, which could potentially explain divergent behavior of waitUntilDurable. vesselina.ratcheva did this fail on any durable storage engines?

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

In a recent replica_sets run from master, the test does run with a single successful election (link to logs), so maybe something else is going on. Perhaps there are different codepaths taken inside waitUntilDurable that affect whether it throws an exception on step down. Would have to investigate more.

Comment by Judah Schvimer [ 17/Apr/20 ]

That's a good point Will, I'm surprised the test isn't failing then

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

Hm, we do check that the new term is exactly 1 greater than the original, though, so it would seem that if there was a failed election followed by a successful one inside the ReplSetTest.stepUp helper then that assertion would fail.

Comment by William Schultz (Inactive) [ 17/Apr/20 ]

judah.schvimer Got it. That would make sense. And great point about the test using the stepUp helper masking this bug. We should change that.

Comment by Judah Schvimer [ 17/Apr/20 ]

I'm not sure if waitUntilDurable was interruptible before SERVER-44555, which may be why we're just seeing this in tests now.

Generated at Thu Feb 08 05:14:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.