[SERVER-70297] Do not respond to heartbeat from removed node if it has the same version-term pair Created: 06/Oct/22 Updated: 07/Oct/22 Resolved: 07/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Ali Mir | Assignee: | Huayu Ouyang |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Sprint: | Repl 2022-10-17 | ||||||||||||
| Participants: | |||||||||||||
| Linked BF Score: | 0 | ||||||||||||
| Description |
|
We have some BFs that occur due to the problem of heartbeats not being able to be cancelled "fully" (any outstanding heartbeats that are sent across the network cannot be cancelled). TSERVER-36417 attempted to drop the pooled connection to removed nodes after a reconfig, but we saw the heartbeat issue prevent that ticket from being fixed. As a workaround within the replication code, if a node sees a heartbeat from a removed node with the same version-term pair, we should not respond and continue to heartbeat. See linked tickets for context. |
| Comments |
| Comment by Ali Mir [ 07/Oct/22 ] |
|
This fix ended up not being related to the linked BF. |