Details
-
Improvement
-
Resolution: Duplicate
-
Major - P3
-
None
-
3.0.1
-
RPL 7 08/10/15
Description
In the contrived scenario of a two member replica set where one member has newer data but a lower priority, and the other has a higher priority but older data, and the replica set config has "noChaining:false" then no election will be successful when they are started.
Old Description
This is a replica set where all nodes except two have priority 0; node rs05 has priority 1, and node rs02 has priority 10. Node rs02 was rebooted; when it came back up node rs05 stepped down, but rs02 did not become PRIMARY.
The logs in rs05 show that rs02 becomes secondary, and how rs05 steps down:
2015-06-24T14:44:35.046+0000 I REPL [ReplicationExecutor] Member rs02:28101 is now in state SECONDARY
|
2015-06-24T14:44:37.047+0000 I REPL [ReplicationExecutor] Stepping down self (priority 1) because rs02:28101 has higher priority 10 and is only 6 seconds behind me
|
2015-06-24T14:44:37.047+0000 I REPL [ReplicationExecutor] Stepping down from primary in response to heartbeat
|
2015-06-24T14:44:37.047+0000 I REPL [replCallbackWithGlobalLock-0] transition to SECONDARY
|
But the logs from rs02 show that rs02 can't sync those 6 seconds from rs05
2015-06-24T14:44:33.129+0000 I REPL [ReplicationExecutor] transition to SECONDARY
|
2015-06-24T14:44:36.072+0000 I REPL [ReplicationExecutor] syncing from primary: rs05:28101
|
2015-06-24T14:44:36.094+0000 I REPL [SyncSourceFeedback] replset setting syncSourceFeedback to rs05:28101
|
2015-06-24T14:44:37.044+0000 E REPL [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: rs05:28101
|
2015-06-24T14:44:37.108+0000 I REPL [ReplicationExecutor] Member rs05:28101 is now in state SECONDARY
|
2015-06-24T14:44:37.108+0000 I REPL [ReplicationExecutor] Standing for election
|
2015-06-24T14:44:37.108+0000 I REPL [ReplicationExecutor] not electing self, we are not freshest
|
2015-06-24T14:44:37.108+0000 I REPL [ReplicationExecutor] not electing self, we are not freshest
|
2015-06-24T14:44:37.109+0000 I REPL [ReplicationExecutor] Standing for election
|
2015-06-24T14:44:37.110+0000 I REPL [ReplicationExecutor] not electing self, we are not freshest
|
2015-06-24T14:44:37.110+0000 I REPL [ReplicationExecutor] not electing self, we are not freshest
|
This is triggered by the following setting for this replica set
"settings" : {
|
"chainingAllowed" : false
|
}
|
When chaining is not allowed a PRIMARY should only step down if the new node with higher priority is up-to-date only. For example:
--- a/src/mongo/db/repl/topology_coordinator_impl.cpp
|
+++ b/src/mongo/db/repl/topology_coordinator_impl.cpp
|
@@ -1208,8 +1208,9 @@ bool TopologyCoordinatorImpl::_aMajoritySeemsToBeUp() const {
|
bool TopologyCoordinatorImpl::_isOpTimeCloseEnoughToLatestToElect(
|
const OpTime& otherOpTime, const OpTime& ourLastOpApplied) const {
|
const OpTime latestKnownOpTime = _latestKnownOpTime(ourLastOpApplied);
|
+ const int closeEnoughSeconds = _rsConfig.isChainingAllowed() ? 10 : 0;
|
// Use addition instead of subtraction to avoid overflow.
|
- return otherOpTime.getSecs() + 10 >= (latestKnownOpTime.getSecs());
|
+ return otherOpTime.getSecs() + closeEnoughSeconds >= (latestKnownOpTime.getSecs());
|
} |
Attachments
Issue Links
- duplicates
-
SERVER-14885 replica sets that disable chaining may have trouble electing a primary if members have different priorities
-
- Closed
-