[SERVER-63418] Oplog fetcher should abort if node goes down Created: 08/Feb/22 Updated: 06/Dec/22 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Backlog - Replication Team |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Replication
|
||||||||
| Participants: | |||||||||
| Description |
|
Because the oplog fetcher uses a separate connection, it is possible that when a node goes down and is noticed for lack of heartbeats, the oplog fetcher will continue to wait on the exhaust connection for some time before it realizes it has timed out. To mitigate this, when a sync source is detected to be down due to heartbeats, and we aren't actively receiving oplog data, we should break the exhaust connection and retry sync source selection. |