Major - P3
Replica sets should only ever contain at most one primary node. If a primary detects another primary in the replica set via the heartbeat messages, the current behavior would force the primary to step down only if its _id in the replica set configuration is higher than the other primary's _id. The intention of this was to only step down one of the primaries, thus avoiding a new election. However, since the _id is chosen arbitrarily and does not indicate priority, this can lead to a lower-priority member remaining as the primary node. Another issue is a one-way network partition, which could potentially lead to multiple primary nodes for prolonged times.
This bug can lead to a primary node that does not have the highest priority, or in rare cases (i.e. with transient network issues) to multiple primaries for prolonged times. The latter situation can affect data integrity.
The fix is to unconditionally step down all primary nodes if multiple primary nodes are detected. While this can cause elections in more cases than before, it is safer than having the wrong primary, or potentially multiple primaries.
In situations where a lower-priority node remains the primary, a forced election with rs.stepDown() can promote the higher-priority node back to primary.
All versions from 2.2.0 to 2.4.9 are affected.
The fix is included in the 2.4.10 production release and the 2.5.5 development release, which will evolve into the 2.6.0 production release.
Check at every heartbeat, as it comes in, that the state of the world shows only one primary at most. If more than one is found, start an election.
- is related to
SERVER-10768 add proper support for SIGSTOP and SIGCONT (currently, on replica set primary can cause data loss)