[SERVER-40159] Add retry logic for name resolution failure in isSelf Created: 15/Mar/19  Updated: 13/Jun/22  Resolved: 16/May/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jason Chan Assignee: Mira Carey
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
related to SERVER-62699 Replica set fails to restart after sh... Backlog
related to SERVER-35649 Nodes removed due to isSelf failure s... Closed
is related to SERVER-35649 Nodes removed due to isSelf failure s... Closed
Sprint: Repl 2019-03-25
Participants:
Linked BF Score: 135

 Description   

Currently, isSelf does not contain any retry logic when attempting name resolution. This causes build failures for tests that experience transient network failures.

We would like to add retry loops to getAddrInfo to reduce these build failures, but it is important that we don't retry forever in case the network failure is indeed not transient (eg. dns misconfiguration).

This requires threading the opCtx to getAddrInfo and may require some refactoring of unit tests to implement cleanly.



 Comments   
Comment by Mira Carey [ 19/Apr/19 ]

We may still want to do this ticket, but without constraints on what we want to do with transient dns failures (should we retry? at what level? for how long?), I think there's no obvious way forward.

For cleaning up bfs, I'd prefer we do BUILD-8351. If we want to test transient dns failures, it would be more useful to do so with a fail point (which we could use to better check edge cases)

Generated at Thu Feb 08 04:54:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.