Details
-
Improvement
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
-
v4.0, v3.6
Description
Heartbeats and replSetUpdatePosition can only wake up replication waiters if they represent optime changes. If no further progress can be made (e.g. when the node in question fully catches up), those waiters will not be signaled unless new writes come in. This is not necessarily an issue in functionality like awaitReplication, but it can be a problem with stepdown. For example, during a stepdown attempt, it is possible to have secondaries catch up while they are frozen, then lift the freeze but have no way to signal the waiters (since everyone is already up to date), leading to the attempt timing out. This can be fixed by allowing heartbeat responses that do not advance optimes to still wake up replication waiters (by doing the minimal amount of work required for that).
This bug was introduced by the changes in SERVER-35058 (specifically here).