[SERVER-46850] Stepup due to election handoff may not have _voteRequester when it's candidate Created: 13/Mar/20 Updated: 18/May/20 Resolved: 18/May/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.6.17, 4.2.4, 4.0.16, 4.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Sprint: | Repl 2020-03-23, Repl 2020-04-20, Repl 2020-05-04, Repl 2020-05-18, Repl 2020-06-01 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
ReplicationCoordinatorImpl::_cancelElectionIfNeeded_inlock() assumes a candidate always has a _voteRequester, which isn't true for stepup command that skips dry run. Stepup command due to election handoff skips dry run and writes down the vote for itself after changing its role to kCandidate, then releases the mutex. Another thread calling _cancelElectionIfNeeded_inlock() will hit the invariant if it runs in this window until the node finishes the write and starts _voteRequester. |
| Comments |
| Comment by Siyuan Zhou [ 18/May/20 ] |
|
I don't thinkĀ |
| Comment by Siyuan Zhou [ 18/May/20 ] |
|
Even if we skip vote request when it's null, the election isn't cancelled. This issue will be fixed by |