[SERVER-12170] Do not call relinquish() when not vetoing an election Created: 19/Dec/13  Updated: 11/Jul/16  Resolved: 20/Dec/13

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 2.5.4
Fix Version/s: 2.4.10, 2.5.5

Type: Bug Priority: Major - P3
Reporter: Matt Dannenberg Assignee: Matt Dannenberg
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-11059 Elections can be delayed by some locks Closed
is duplicated by SERVER-12218 Can't vote when foreground index buil... Closed
Related
related to SERVER-12098 node with votes:0 can get involved in... Closed
Operating System: ALL
Participants:

 Description   
Issue Status as of March 31, 2014

ISSUE SUMMARY

In the election logic, if a node is not vetoing an election, a call to the relinquish() method is made that would step down a primary or change the state of a node from STARTUP2 to RECOVERY. This call is not necessary and can delay or time out the election, due to a write lock taken to clear out the write buffer.

USER IMPACT

This bug can delay elections.

SOLUTION

The fix was to remove the unnecessary call to relinquish().

WORKAROUNDS

None

AFFECTED VERSIONS

All recent production release versions up to 2.4.9 are affected.

PATCHES

The fix is included in the 2.4.10 production release and the 2.5.5 development version, which will evolve into the 2.6.0 production release.

Original Description

The call to relinquish() does nothing good, and causes two bugs:
1. It is possible to transition from STARTUP2 to RECOVERING early, which causes incorrect RS logic later.
2. The call to relinquish() attempts to grab a global write lock while holding the rs mutex, which may delay heartbeats and elections if a long-running write operation (such as a foreground index build) is already in progress.



 Comments   
Comment by Githook User [ 09/Mar/14 ]

Author:

{u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-12170 stop calling relinquish() when replicaset nodes decide not to veto an election
Branch: v2.4
https://github.com/mongodb/mongo/commit/df5a9d90ebe21f23bd710377b32d5e1f523879b4

Comment by Githook User [ 20/Dec/13 ]

Author:

{u'username': u'dannenberg', u'name': u'Matt Dannenberg', u'email': u'matt.dannenberg@10gen.com'}

Message: SERVER-12170 stop calling relinquish() when replicaset nodes decide not to veto an election
Branch: master
https://github.com/mongodb/mongo/commit/6f1225ce6ed724e3dcc13a0acecb9d57a2e1dc47

Comment by Eric Milkie [ 19/Dec/13 ]

This is no point to calling relinquish() there. I looked up when it was added and I believe it was simply a mistake. Let's remove it from the elect response command handling.

Comment by Matt Dannenberg [ 19/Dec/13 ]

In Consensus::electCmdReceived()

Comment by Eric Milkie [ 19/Dec/13 ]

How are we hitting relinquish() if we're not in state PRIMARY?

Generated at Thu Feb 08 03:27:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.