[CDRIVER-4028] Wrong error message printed when DNS resolution fails Created: 18/Jun/21 Updated: 28/Oct/23 Resolved: 01/Jul/21 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 1.17.7 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Andreas Braun | Assignee: | Andreas Braun |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
While investigating HELP-25377, I noticed that _mongoc_get_rr_search uses strerror to print an error message from h_errno. The latter is set when an error occurs in the res_nsearch or res_search calls earlier. However, h_errno is not designed to be run through strerror, and the actual error is different from what we see in the error message. In HELP-25377 in particular, the error message seen was "Interrupted system call". We can see its mapping:
Looking at the error section for h_errno in the manual, this is not at all what's happening:
h_errno.h also defines hstrerror to retrieve the error string for a given error code, but this has been marked obsolete. With that in mind, I'd suggest adding _mongoc_hstrerror to get an error string for an error, taken from the list above. I'll note that whether on purpose or by oversight, the function also ignores the TRY_AGAIN error. One could argue that "Try again later" does not suggest retrying the lookup right away, and it also wouldn't have fixed the problem in HELP-25377 as h_errno is set to NO_DATA. However, it might be beneficial to try again to protect against transient failures. |
| Comments |
| Comment by Githook User [ 01/Jul/21 ] |
|
Author: {'name': 'Andreas Braun', 'email': 'alcaeus@users.noreply.github.com', 'username': 'alcaeus'}Message:
|
| Comment by Githook User [ 01/Jul/21 ] |
|
Author: {'name': 'Andreas Braun', 'email': 'alcaeus@users.noreply.github.com', 'username': 'alcaeus'}Message:
|
| Comment by Jeremy Mikola [ 29/Jun/21 ] |
Quoting this Stack Overflow discussion:
I realize the context here pertains to SRV resolution and we aren't using gethostbyname directly. Of the various APIs we use for DNS here, h_errno is only mentioned in resolver(3). I belive that corresponds to MONGOC_HAVE_RES_NSEARCH. I didn't find any reference to h_errno from the APIs used for MONGOC_HAVE_RES_SEARCH (e.g. res_search(3); however, those do refer to gethostbyname(3) so I presume they also set h_errno. More generally, I wonder if there's a newer API for DNS resolution (perhaps related to getaddrinfo) that we should consider using. |
| Comment by Andreas Braun [ 29/Jun/21 ] |