[SERVER-63417] Oplog fetcher should not retry when a node is known to be down Created: 08/Feb/22  Updated: 29/Oct/23  Resolved: 16/Mar/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 6.0.0-rc0, 5.0.7

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
Related
related to SERVER-63792 Improve coverage of blackholing netwo... Open
related to SERVER-63418 Oplog fetcher should abort if node go... Backlog
is related to SERVER-64635 Coverity analysis defect 121898: Pars... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v5.0
Sprint: Repl 2022-03-07, Repl 2022-03-21
Participants:
Linked BF Score: 176

 Description   

In steady state oplog fetching, the oplog fetcher will retry (once by default) on the same node it is already syncing from. If we already know the node is down (due to missed heartbeats), we should instead fail and re-run sync source selection. This will result in a shorter time to majority write availability in the case where the retry takes a long time.



 Comments   
Comment by Githook User [ 17/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Oplog fetcher should not retry when a node is known to be down

(cherry picked from commit d173689149305c83f6c5e45878e54698694f4106)
Branch: v5.0
https://github.com/mongodb/mongo/commit/0a9102423f7199472f99197539680375df467524

Comment by Githook User [ 17/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Refactor shouldChangeSyncSource and improve tests for it in toplogy coordinator

(cherry picked from commit bff37c1e83f474ad68a396951c862b290b6f5fa5)
Branch: v5.0
https://github.com/mongodb/mongo/commit/285fe9de60cce759d745f67b55d828bc19e4c482

Comment by Githook User [ 16/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Oplog fetcher should not retry when a node is known to be down
Branch: master
https://github.com/mongodb/mongo/commit/d173689149305c83f6c5e45878e54698694f4106

Comment by Githook User [ 16/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Oplog fetcher should not retry when a node is known to be down
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/4f384e99a5f32cb0ad8909cdd1265d0785e405cb

Comment by Githook User [ 08/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Refactor shouldChangeSyncSource and improve tests for it in toplogy coordinator
Branch: master
https://github.com/mongodb/mongo/commit/bff37c1e83f474ad68a396951c862b290b6f5fa5

Comment by Githook User [ 08/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: Revert "SERVER-63417 Refactor shouldChangeSyncSource and improve tests for it in toplogy coordinator"

This reverts commit 0389572a04fb20d6ebc319c704efd2a8daf92068.
Branch: master
https://github.com/mongodb/mongo/commit/3924f84894a1d6ff9d5088fa70f0009877528eda

Comment by Githook User [ 08/Mar/22 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-63417 Refactor shouldChangeSyncSource and improve tests for it in toplogy coordinator
Branch: master
https://github.com/mongodb/mongo/commit/f70e73ccc422499080cfb4163efe42a64a4e59e1

Generated at Thu Feb 08 05:57:44 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.