Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 2.4.10, 2.5.5
Affects Version/s: None
Component/s: Replication
Labels:
None

Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of March 28, 2014

ISSUE SUMMARY
Replica sets should only ever contain at most one primary node. If a primary detects another primary in the replica set via the heartbeat messages, the current behavior would force the primary to step down only if its _id in the replica set configuration is higher than the other primary's _id. The intention of this was to only step down one of the primaries, thus avoiding a new election. However, since the _id is chosen arbitrarily and does not indicate priority, this can lead to a lower-priority member remaining as the primary node. Another issue is a one-way network partition, which could potentially lead to multiple primary nodes for prolonged times.

USER IMPACT
This bug can lead to a primary node that does not have the highest priority, or in rare cases (i.e. with transient network issues) to multiple primaries for prolonged times. The latter situation can affect data integrity.

SOLUTION
The fix is to unconditionally step down all primary nodes if multiple primary nodes are detected. While this can cause elections in more cases than before, it is safer than having the wrong primary, or potentially multiple primaries.

WORKAROUNDS
In situations where a lower-priority node remains the primary, a forced election with rs.stepDown() can promote the higher-priority node back to primary.

AFFECTED VERSIONS
All versions from 2.2.0 to 2.4.9 are affected.

PATCHES
The fix is included in the 2.4.10 production release and the 2.5.5 development release, which will evolve into the 2.6.0 production release.

Original Description

Check at every heartbeat, as it comes in, that the state of the world shows only one primary at most. If more than one is found, start an election.

is related to

SERVER-10768 add proper support for SIGSTOP and SIGCONT (currently, on replica set primary can cause data loss)

Closed

Assignee:: Eric Milkie
Reporter:: Scott Hernandez (Inactive)
Participants:: Eric Milkie, Githook User, Scott Hernandez, Zardosht Kasheff
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Sep 16 2013 09:13:04 PM UTC
Updated:: Jul 11 2016 05:39:47 PM UTC
Resolved:: Jan 15 2014 03:26:26 PM UTC

Details

Description

Original Description

Attachments

Issue Links

Forms

Activity

People

Dates