[SERVER-19118] Always allow replica to replicate from non-primary when necessary Created: 24/Jun/15 Updated: 14/Apr/16 Resolved: 24/Jul/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 3.0.1 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ramon Fernandez Marina | Assignee: | Eric Milkie |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | mms-s | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Sprint: | RPL 7 08/10/15 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
In the contrived scenario of a two member replica set where one member has newer data but a lower priority, and the other has a higher priority but older data, and the replica set config has "noChaining:false" then no election will be successful when they are started. Old Description The logs in rs05 show that rs02 becomes secondary, and how rs05 steps down:
But the logs from rs02 show that rs02 can't sync those 6 seconds from rs05
This is triggered by the following setting for this replica set
When chaining is not allowed a PRIMARY should only step down if the new node with higher priority is up-to-date only. For example:
|
| Comments |
| Comment by Eric Milkie [ 24/Jul/15 ] |
|
Follow |
| Comment by Andy Schwerin [ 24/Jul/15 ] |
|
milkie, should we close this as a dupe of |
| Comment by Scott Hernandez (Inactive) [ 24/Jun/15 ] |
|
After some discussion, a few more points were made about how this could affect different scenarios and it would rollback data in some of those cases. This is about the meaning of "noChaining" really being "preferPrimary" (which allows replication from a non-primary when there is no known primary) and that when there is no primary a replica should be able to catch up and be elected. The underlying issue isn't about the primary stepping down, but more about the election that ensues. The stepping down logic can also be addresses but a larger issue exists. |
| Comment by Scott Hernandez (Inactive) [ 24/Jun/15 ] |
|
I'm not sure this is the correct solution, and that it is limited to affecting only this scenario. Is there a reason to make the change in _isOpTimeCloseEnoughToLatestToElect, like this? It might be better to back-up and look at the end state we want (higher priority node elected), and not choose a solution yet. |