[SERVER-15783] Consider not bailing when Manager::msgCheckNewState notices two other primaries transiently Created: 22/Oct/14 Updated: 23/Feb/15 Resolved: 23/Feb/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 2.6.5 |
| Fix Version/s: | None |
| Type: | Question | Priority: | Minor - P4 |
| Reporter: | Zardosht Kasheff | Assignee: | Andy Schwerin |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
In Manager::msgCheckNewState, if findOtherPrimary finds that two other members think they are primary, the method returns without having done anything. I think in the (unlikely) case where there are > 2 primaries, this may be problematic. If there are > 2 primaries, each primary will notice that at least 2 other primaries exist (by having bool two set to true when calling findOtherPrimary), and bail. As a result, no primary steps down. Instead, perhaps the function should continue, and noteARemoteIsPrimary should loop through all members checking if there is any primary with a more recent election time, and step down if that is the case. msgCheckNewState will then return anyway given that a primary exists. I realize that with the 30 second election timer, having > 2 primaries is incredibly difficult, but if the timer is removed this issue may become more likely. |
| Comments |
| Comment by Andy Schwerin [ 29/Oct/14 ] |
|
msgCheckNewState still exists on master, but is dead code. It'll probably get yanked after the 2.8 branch. I'll move this ticket to "Planned but not scheduled", pending your having time to send a pull request. Thanks. |
| Comment by Zardosht Kasheff [ 29/Oct/14 ] |
|
Andy, thanks for replying. When I filed this ticket, msgCheckNewState still existed on master (I think), and I knew there were plans of removing the 30 second timer. I think those two things together makes this problem more relevant, but it sounds like that will not be true, on 2.6 or 2.8. I'll submit a pull request when I get to this in TokuMX. |
| Comment by Andy Schwerin [ 29/Oct/14 ] |
|
zardosht, if you successfully got 3 or more nodes to believe they were primary, the problem you describe could occur in 2.6 and before, but is not possible as of 2.7.8. The difference is that as of 2.7.8, the code that replaces Manager::msgCheckNewState (TopologyCoordinatorImpl::_updateHeartbeatDataImpl) executes synchronously whenever a new heartbeat response comes in. This means that the processing logic never sees 2 remote primaries that it hasn't seen before in a single execution. There will always be some first remote primary to appear, leading to the execution of the step down logic that runs whenever there is exactly one other primary. Since this is true on every node, eventually all but the most recently elected node will step down of their own volition. If you want to produce a pull request for the 2.6 branch, I'll take a look. |