[SERVER-36387] Allow heartbeat responses to wake ready waiters even when they do not advance optimes Created: 01/Aug/18 Updated: 25/Nov/18 Resolved: 02/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Vesselina Ratcheva (Inactive) | Assignee: | Vesselina Ratcheva (Inactive) |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Backport Requested: |
v4.0, v3.6
|
||||||||
| Participants: | |||||||||
| Description |
|
Heartbeats and replSetUpdatePosition can only wake up replication waiters if they represent optime changes. If no further progress can be made (e.g. when the node in question fully catches up), those waiters will not be signaled unless new writes come in. This is not necessarily an issue in functionality like awaitReplication, but it can be a problem with stepdown. For example, during a stepdown attempt, it is possible to have secondaries catch up while they are frozen, then lift the freeze but have no way to signal the waiters (since everyone is already up to date), leading to the attempt timing out. This can be fixed by allowing heartbeat responses that do not advance optimes to still wake up replication waiters (by doing the minimal amount of work required for that). This bug was introduced by the changes in |
| Comments |
| Comment by Vesselina Ratcheva (Inactive) [ 02/Aug/18 ] |
|
A little bit of both, actually. Before What I described in the ticket would be a big problem in PV0, since it has "VotedTooRecently" as a reason to not be electable. With this, you could easily run into the situation where the vote lease expires after everyone has caught up, leaving you with nothing to signal the waiters. We decided the solution is to simply skip Closing this as "Won't Fix". |
| Comment by Judah Schvimer [ 01/Aug/18 ] |
|
Will this introduce any performance regressions around waking up waiters unnecessarily? Or is this returning to an old behavior? |