Details
-
Task
-
Resolution: Won't Fix
-
Major - P3
-
None
-
None
-
None
Description
Currently in the replication rewrite we generate responses to heartbeat messages by scheduling a callback in the TopologyCoordinator. This requires 2 context switches to specific threads to be able to reply to a heartbeat. I am concerned that in overloaded systems this could cause heartbeats to time out more readily. Ideally the coordinator would know all it needs to to build the response, then could schedule a callback on the topcoord for it to update whatever state needed to track the fact that a heartbeat was received, but without blocking the thread that received the heartbeat. This would allow heartbeats to be responded to as soon as possible