[SERVER-51513] Restart heartbeats for catchup should mark all nodes restarted rather than just the scheduled ones Created: 13/Oct/20  Updated: 06/Dec/22  Resolved: 13/Oct/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Siyuan Zhou Assignee: Backlog - Replication Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-50318 Only restart scheduled heartbeats Closed
Problem/Incident
is caused by SERVER-50318 Only restart scheduled heartbeats Closed
Assigned Teams:
Replication
Operating System: ALL
Participants:
Linked BF Score: 33

 Description   

Due to SERVER-50318, we only restart scheduled heartbeats. However, catchup expects all nodes to be restarted by resetting _updatedSinceRestart to false. Thus catchup may not wait for the latest heartbeats for all nodes and could exit catchup earlier. This is a regression.

On the restart for catchup. We should call hb.restart for all nodes rather than the scheduled. Alternatively, we could restart the heartbeats for all blindly on winning an election.



 Comments   
Comment by Siyuan Zhou [ 13/Oct/20 ]

Closing as a dup to SERVER-50318.

Comment by Xuerui Fa [ 13/Oct/20 ]

The revert for SERVER-50318 has been merged in, so this should be safe to close

Comment by Tess Avitabile (Inactive) [ 13/Oct/20 ]

We can close this if we instead revert SERVER-50318 and include the fix in the new commit.

Generated at Thu Feb 08 05:25:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.