[SERVER-70297] Do not respond to heartbeat from removed node if it has the same version-term pair Created: 06/Oct/22  Updated: 07/Oct/22  Resolved: 07/Oct/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Ali Mir Assignee: Huayu Ouyang
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-36417 Drop pooled connections to nodes no l... Blocked
Sprint: Repl 2022-10-17
Participants:
Linked BF Score: 0

 Description   

We have some BFs that occur due to the problem of heartbeats not being able to be cancelled "fully" (any outstanding heartbeats that are sent across the network cannot be cancelled). TSERVER-36417 attempted to drop the pooled connection to removed nodes after a reconfig, but we saw the heartbeat issue prevent that ticket from being fixed.

As a workaround within the replication code, if a node sees a heartbeat from a removed node with the same version-term pair, we should not respond and continue to heartbeat. See linked tickets for context.



 Comments   
Comment by Ali Mir [ 07/Oct/22 ]

This fix ended up not being related to the linked BF.

Generated at Thu Feb 08 06:15:48 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.