[SERVER-41353] Fix late calls to errnoWithDescription() Created: 29/May/19 Updated: 29/Oct/23 Resolved: 28/Apr/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Diagnostics, Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 6.1.0-rc0 |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Kevin Pulo | Assignee: | Billy Donahue |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | logging, move-sec, sa-groomed | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v5.0, v4.4
|
||||||||||||||||||||||||
| Sprint: | Service Arch 2022-03-21, Service Arch 2022-04-04, Service Arch 2022-04-18, Service Arch 2022-05-02, Service Arch 2022-05-16 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Story Points: | 4 | ||||||||||||||||||||||||
| Description |
|
Calling errnoWithDescription() without any arguments causes it to refer to the errno global. However, unless this is done immediately after returning from a function call that might set errno, the value of errno might get trampled by other code which calls errno-setting functions. This can cause the wrong error string to be reported by the failure message, including situations where an error is reported, but the cause of the error is confusingly "Success". There are currently 100 call sites of errnoWithDescription() with no args (and 1 in enterprise). 62 of these are inside the streaming operator, which is too late (at least potentially, because the constructor of the object being streamed into might do something that resets errno). This ticket is to audit all usages of errnoWithDescription() with no args, and clean up those where it's called too late — or come up with some sane way to automatically capture the relevant value of errno before there is any possibility of it being overwritten.
Acceptance criteria: Remove the errnoWithDescription overload that takes no arguments and fix the affected call sites. |
| Comments |
| Comment by Jason Chan [ 08/May/23 ] |
|
Requesting backport of only the library additions to errno_util.h/cpp and the accompanying unit tests to help with backports of other features that make use of these helpers. |
| Comment by Githook User [ 28/Apr/22 ] |
|
Author: {'name': 'Billy Donahue', 'email': 'billy.donahue@mongodb.com', 'username': 'BillyDonahue'}Message: |
| Comment by Billy Donahue [ 25/Mar/22 ] |
| Comment by Billy Donahue [ 20/Oct/20 ] |
|
We have a problem with errnoWithDescription's overload set and usage, as described here. When looking into fixing it, I found we have another problem, that even if we capture the errors at the right time, Splitting this ticket out into another one dedicated to the <system_error> work. We should be using <system_error> to propagate error categories and lift errno and GetLastError and other separate and distinct kinds of int-based error codes into C++ with type safety. Then something like errnoWithDescription (which would be renamed) can be done safely. Split the <system_error> work into ticket |
| Comment by Billy Donahue [ 11/Jul/20 ] |
|
After replacing all the errnoWithDescription with a mongo::Errno handler, I discovered there's sloppy conflation of errno with GetLastError codes. Win32 has both an errno int (signed 32-bit) and a GetLastError DWORD (unsigned 32-bit). They aren't in the same number space, and a DWORD can't be portably stored into a signed int without special care (which we aren't taking). I correctly get compiler errors from the narrowing conversions of DWORD to int after textually replacing errnoWithDescription(e) expressions to Errno{e}.desc() expressions. Braced-init doesn't tolerate narrowing! Logically, a function called errnoWithDescription should only be looking at errno codes, but we have it so that it does a GetLastError on Win32, and there is no helper function analyze the `errno` codes (as set by _read, etc) on Win32. We need to keep GetLastError and errno regimes separated to do this right. We'll have clearer and more correct error handling. |
| Comment by Billy Donahue [ 10/Jul/20 ] |
| Comment by Billy Donahue [ 10/Jul/20 ] |
|
I looked at more cases. There are a lot of bugs of this type in the code. Some cases are using errnoWithDescription with no argument, but other bug sites are using it with a stale argument. We can sweep the code once, but how do we keep the barn door closed? I'm going to make a {mongo::Errno} class, and get rid of all the errnoWith... nonmember functions. I think that will help us create an idiomatic push so that more call sites will handle errno (and Win32's GetLastError) safely going forward. Unless we ban and wrap all the libc errno-manipulating functions, I don't think we can enforce correct errno discipline, but we can get closer than we are now.
|