[SERVER-34661] Return early when the vote request response has an error Created: 24/Apr/18  Updated: 29/Oct/23  Resolved: 23/May/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.4.16, 3.6.6, 4.0.0-rc1, 4.1.1

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Suganthi Mani
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-34682 Old primary should vote yes and store... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0
Sprint: Repl 2018-05-21, Repl 2018-06-04
Participants:

 Description   

We should return earlier when the response of vote request is not ok rather than ignoring its status and assuming the term is present.



 Comments   
Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'smani87', 'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-34661 Return early when the vote request response has an error.
Fixed by making VoteRequester::Algorithm::processResponseto to check ok field in the vote response message.

(cherry picked from commit e4e2162c489c1faa569463f51058ebc09368a5f9)
Branch: v3.6
https://github.com/mongodb/mongo/commit/8e381a2fe00605764e208d9c6c42a386a78dd6e0

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'smani87', 'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-34661 Return early when the vote request response has an error.
Fixed by making VoteRequester::Algorithm::processResponseto to check ok field in the vote response message.

(cherry picked from commit e4e2162c489c1faa569463f51058ebc09368a5f9)
Branch: v3.4
https://github.com/mongodb/mongo/commit/7f3e69a63c3ab61bfea2a563ded9d555fb017c8a

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'smani87', 'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-34661 Return early when the vote request response has an error.
Fixed by making VoteRequester::Algorithm::processResponseto to check ok field in the vote response message.

(cherry picked from commit e4e2162c489c1faa569463f51058ebc09368a5f9)
Branch: v4.0
https://github.com/mongodb/mongo/commit/65cd5579b5bde734eaba0d97a7e285f705d78b92

Comment by Githook User [ 23/May/18 ]

Author:

{'username': 'smani87', 'name': 'Suganthi Mani', 'email': 'suganthi.mani@mongodb.com'}

Message: SERVER-34661 Return early when the vote request response has an error.
Fixed by making VoteRequester::Algorithm::processResponseto to check ok field in the vote response message.
Branch: master
https://github.com/mongodb/mongo/commit/e4e2162c489c1faa569463f51058ebc09368a5f9

Comment by Siyuan Zhou [ 26/Apr/18 ]

A vote request to the old primary will force it to step down and vote yes, but the old primary fails to store the last vote.

d20020| 2018-04-25T23:01:36.439-0400 E REPL     [conn4] replSetRequestVotes failed to store LastVote document; InterruptedDueToReplStateChange: operation was interrupted

The new primary receives an error response (ok: 0) which also has the expected fields, so it's considered a "yes" vote.

d20021| 2018-04-25T23:01:36.441-0400 I REPL     [replexec-2] VoteRequester(term 2) received a yes vote from siyuan-ws:20020; response message: { term: 2, voteGranted: true, reason: "", ok: 0.0, errmsg: "operation was interrupted", code: 11602, codeName: "InterruptedDueToReplStateChange", operationTime: Timestamp(1524711694, 2), $clusterTime: { clusterTime: Timestamp(1524711694, 2), signature: { hash: BinData(0, 0000000000000000000000000000000000000000), keyId: 0 } } }
 
// formatted response
{
    term: 2,
    voteGranted: true,
    reason: "",
    ok: 0.0,
    errmsg: "operation was interrupted",
    code: 11602,
    codeName: "InterruptedDueToReplStateChange",
    operationTime: Timestamp(1524711694, 2),
    $clusterTime: {
        clusterTime: Timestamp(1524711694, 2),
        signature: {
            hash: BinData(0, 0000000000000000000000000000000000000000),
            keyId: 0
        }
    }
}

The last vote fails to store on the old primary because acquiring lock calls checkForInterruptNoAssert() which returns InterruptedDueToReplStateChange.

The voter and candidate both need fixes, but the voter fix is critical. spencer, SERVER-34682 has been filed for the voter and I suggest putting SERVER-34682 into RC0. Backport of SERVER-34682 may not be necessary because it's caused by SERVER-27534. This ticket may need backport.

Comment by Spencer Brody (Inactive) [ 24/Apr/18 ]

Also voteResponse.initialize doesn't check the 'ok' field of the response. We should also getStatusFromCommandResponse(response.data) and make sure that the command didn't error before inspecting the other fields in the reply.

Generated at Thu Feb 08 04:37:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.