[CDRIVER-4028] Wrong error message printed when DNS resolution fails Created: 18/Jun/21  Updated: 28/Oct/23  Resolved: 01/Jul/21

Status: Closed
Project: C Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.17.7

Type: Bug Priority: Major - P3
Reporter: Andreas Braun Assignee: Andreas Braun
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to CDRIVER-4249 Undeclared DNS constants and symbols ... Closed

 Description   

While investigating HELP-25377, I noticed that _mongoc_get_rr_search uses strerror to print an error message from h_errno. The latter is set when an error occurs in the res_nsearch or res_search calls earlier. However, h_errno is not designed to be run through strerror, and the actual error is different from what we see in the error message. In HELP-25377 in particular, the error message seen was "Interrupted system call". We can see its mapping:

#define	EINTR		 4	/* Interrupted system call */

Looking at the error section for h_errno in the manual, this is not at all what's happening:

Errors

The variable h_errno can have the following values:

HOST_NOT_FOUND
The specified host is unknown.
NO_ADDRESS or NO_DATA
The requested name is valid but does not have an IP address.
NO_RECOVERY
A nonrecoverable name server error occurred.
TRY_AGAIN
A temporary error occurred on an authoritative name server. Try again later.

h_errno.h also defines hstrerror to retrieve the error string for a given error code, but this has been marked obsolete. With that in mind, I'd suggest adding _mongoc_hstrerror to get an error string for an error, taken from the list above.

I'll note that whether on purpose or by oversight, the function also ignores the TRY_AGAIN error. One could argue that "Try again later" does not suggest retrying the lookup right away, and it also wouldn't have fixed the problem in HELP-25377 as h_errno is set to NO_DATA. However, it might be beneficial to try again to protect against transient failures.



 Comments   
Comment by Githook User [ 01/Jul/21 ]

Author:

{'name': 'Andreas Braun', 'email': 'alcaeus@users.noreply.github.com', 'username': 'alcaeus'}

Message: CDRIVER-4028 Print correct error message when DNS resolution fails (#811)

  • Print correct error message when DNS resolution fails
Comment by Githook User [ 01/Jul/21 ]

Author:

{'name': 'Andreas Braun', 'email': 'alcaeus@users.noreply.github.com', 'username': 'alcaeus'}

Message: CDRIVER-4028 Print correct error message when DNS resolution fails (#811)

  • Print correct error message when DNS resolution fails
Comment by Jeremy Mikola [ 29/Jun/21 ]

h_errno.h also defines hstrerror to retrieve the error string for a given error code, but this has been marked obsolete

Quoting this Stack Overflow discussion:

They are obsolete because gethostbyname* is obsolete. Use getaddrinfo instead, and use gai_strerror for errors. From the gethostbyname(3) man page:

The gethostbyname*() and gethostbyaddr*() functions are obsolete. Applications should use getaddrinfo(3) and getnameinfo(3) instead.

I realize the context here pertains to SRV resolution and we aren't using gethostbyname directly.

Of the various APIs we use for DNS here, h_errno is only mentioned in resolver(3). I belive that corresponds to MONGOC_HAVE_RES_NSEARCH. I didn't find any reference to h_errno from the APIs used for MONGOC_HAVE_RES_SEARCH (e.g. res_search(3); however, those do refer to gethostbyname(3) so I presume they also set h_errno.

More generally, I wonder if there's a newer API for DNS resolution (perhaps related to getaddrinfo) that we should consider using.

Comment by Andreas Braun [ 29/Jun/21 ]

https://github.com/mongodb/mongo-c-driver/pull/811

Generated at Wed Feb 07 21:19:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.