-
Type: Bug
-
Resolution: Done
-
Priority: Major - P3
-
Affects Version/s: 2.5.4
-
Component/s: Replication
-
None
-
ALL
ISSUE SUMMARY
In the election logic, if a node is not vetoing an election, a call to the relinquish() method is made that would step down a primary or change the state of a node from STARTUP2 to RECOVERY. This call is not necessary and can delay or time out the election, due to a write lock taken to clear out the write buffer.
USER IMPACT
This bug can delay elections.
SOLUTION
The fix was to remove the unnecessary call to relinquish().
WORKAROUNDS
None
AFFECTED VERSIONS
All recent production release versions up to 2.4.9 are affected.
PATCHES
The fix is included in the 2.4.10 production release and the 2.5.5 development version, which will evolve into the 2.6.0 production release.
Original Description
The call to relinquish() does nothing good, and causes two bugs:
1. It is possible to transition from STARTUP2 to RECOVERING early, which causes incorrect RS logic later.
2. The call to relinquish() attempts to grab a global write lock while holding the rs mutex, which may delay heartbeats and elections if a long-running write operation (such as a foreground index build) is already in progress.
- is duplicated by
-
SERVER-11059 Elections can be delayed by some locks
- Closed
-
SERVER-12218 Can't vote when foreground index build in progress
- Closed
- related to
-
SERVER-12098 node with votes:0 can get involved in election and result in abnormal state
- Closed