[SERVER-19118] Always allow replica to replicate from non-primary when necessary Created: 24/Jun/15  Updated: 14/Apr/16  Resolved: 24/Jul/15

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 3.0.1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ramon Fernandez Marina Assignee: Eric Milkie
Resolution: Duplicate Votes: 0
Labels: mms-s
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-14885 replica sets that disable chaining ma... Closed
Related
Sprint: RPL 7 08/10/15
Participants:

 Description   

In the contrived scenario of a two member replica set where one member has newer data but a lower priority, and the other has a higher priority but older data, and the replica set config has "noChaining:false" then no election will be successful when they are started.

Old Description
This is a replica set where all nodes except two have priority 0; node rs05 has priority 1, and node rs02 has priority 10. Node rs02 was rebooted; when it came back up node rs05 stepped down, but rs02 did not become PRIMARY.

The logs in rs05 show that rs02 becomes secondary, and how rs05 steps down:

2015-06-24T14:44:35.046+0000 I REPL     [ReplicationExecutor] Member rs02:28101 is now in state SECONDARY
2015-06-24T14:44:37.047+0000 I REPL     [ReplicationExecutor] Stepping down self (priority 1) because rs02:28101 has higher priority 10 and is only 6 seconds behind me
2015-06-24T14:44:37.047+0000 I REPL     [ReplicationExecutor] Stepping down from primary in response to heartbeat
2015-06-24T14:44:37.047+0000 I REPL     [replCallbackWithGlobalLock-0] transition to SECONDARY

But the logs from rs02 show that rs02 can't sync those 6 seconds from rs05

2015-06-24T14:44:33.129+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2015-06-24T14:44:36.072+0000 I REPL     [ReplicationExecutor] syncing from primary: rs05:28101
2015-06-24T14:44:36.094+0000 I REPL     [SyncSourceFeedback] replset setting syncSourceFeedback to rs05:28101
2015-06-24T14:44:37.044+0000 E REPL     [rsBackgroundSync] sync producer problem: 10278 dbclient error communicating with server: rs05:28101
2015-06-24T14:44:37.108+0000 I REPL     [ReplicationExecutor] Member rs05:28101 is now in state SECONDARY
2015-06-24T14:44:37.108+0000 I REPL     [ReplicationExecutor] Standing for election
2015-06-24T14:44:37.108+0000 I REPL     [ReplicationExecutor] not electing self, we are not freshest
2015-06-24T14:44:37.108+0000 I REPL     [ReplicationExecutor] not electing self, we are not freshest
2015-06-24T14:44:37.109+0000 I REPL     [ReplicationExecutor] Standing for election
2015-06-24T14:44:37.110+0000 I REPL     [ReplicationExecutor] not electing self, we are not freshest
2015-06-24T14:44:37.110+0000 I REPL     [ReplicationExecutor] not electing self, we are not freshest

This is triggered by the following setting for this replica set

    "settings" : {
        "chainingAllowed" : false
    }

When chaining is not allowed a PRIMARY should only step down if the new node with higher priority is up-to-date only. For example:

--- a/src/mongo/db/repl/topology_coordinator_impl.cpp
+++ b/src/mongo/db/repl/topology_coordinator_impl.cpp
@@ -1208,8 +1208,9 @@ bool TopologyCoordinatorImpl::_aMajoritySeemsToBeUp() const {
 bool TopologyCoordinatorImpl::_isOpTimeCloseEnoughToLatestToElect(
     const OpTime& otherOpTime, const OpTime& ourLastOpApplied) const {
     const OpTime latestKnownOpTime = _latestKnownOpTime(ourLastOpApplied);
+    const int closeEnoughSeconds = _rsConfig.isChainingAllowed() ? 10 : 0;
     // Use addition instead of subtraction to avoid overflow.
-    return otherOpTime.getSecs() + 10 >= (latestKnownOpTime.getSecs());
+    return otherOpTime.getSecs() + closeEnoughSeconds >= (latestKnownOpTime.getSecs());
 }



 Comments   
Comment by Eric Milkie [ 24/Jul/15 ]

Follow SERVER-14885 for updates.

Comment by Andy Schwerin [ 24/Jul/15 ]

milkie, should we close this as a dupe of SERVER-14885 and move implementation and discussion work there?

Comment by Scott Hernandez (Inactive) [ 24/Jun/15 ]

After some discussion, a few more points were made about how this could affect different scenarios and it would rollback data in some of those cases.

This is about the meaning of "noChaining" really being "preferPrimary" (which allows replication from a non-primary when there is no known primary) and that when there is no primary a replica should be able to catch up and be elected.

The underlying issue isn't about the primary stepping down, but more about the election that ensues. The stepping down logic can also be addresses but a larger issue exists.

Comment by Scott Hernandez (Inactive) [ 24/Jun/15 ]

I'm not sure this is the correct solution, and that it is limited to affecting only this scenario. Is there a reason to make the change in _isOpTimeCloseEnoughToLatestToElect, like this?

It might be better to back-up and look at the end state we want (higher priority node elected), and not choose a solution yet.

Generated at Thu Feb 08 03:49:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.