[SERVER-40159] Add retry logic for name resolution failure in isSelf Created: 15/Mar/19 Updated: 13/Jun/22 Resolved: 16/May/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jason Chan | Assignee: | Mira Carey |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Sprint: | Repl 2019-03-25 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 135 | ||||||||||||||||||||
| Description |
|
Currently, isSelf does not contain any retry logic when attempting name resolution. This causes build failures for tests that experience transient network failures. We would like to add retry loops to getAddrInfo to reduce these build failures, but it is important that we don't retry forever in case the network failure is indeed not transient (eg. dns misconfiguration). This requires threading the opCtx to getAddrInfo and may require some refactoring of unit tests to implement cleanly. |
| Comments |
| Comment by Mira Carey [ 19/Apr/19 ] |
|
We may still want to do this ticket, but without constraints on what we want to do with transient dns failures (should we retry? at what level? for how long?), I think there's no obvious way forward. For cleaning up bfs, I'd prefer we do BUILD-8351. If we want to test transient dns failures, it would be more useful to do so with a fail point (which we could use to better check edge cases) |