[SERVER-82459] Fall back to default signal handler when a thread receives two signals Created: 26/Oct/23 Updated: 16/Nov/23 Resolved: 15/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.3.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | George Wangensteen | Assignee: | Ryan Berryhill |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Service Arch
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Sprint: | Service Arch Prioritized List | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
In https://github.com/mongodb/mongo/blob/8e4b5670df9b9fe814e57cb5f3f8ee9407237b5a/src/mongo/util/signal_handlers_synchronous.cpp , the server defines signal-handlers for a variety of signals that can be synchronously generated, like SIGSEGV and SIGABRT. The signal-handling action for these signals is defined to be some version of logging a fatal error, collecting and logging a backtrace, and the exiting. When a thread receives a second such signal (e.g., it's handling an abort and the signal handler segfaults), the second signal handler calls quickExit, which may call into logging (risky when we're two signal handlers deep) and doesn't call into the default signal handler. This means we won't get a core dump in this case. We should call `endProcessWithSignal`. |
| Comments |
| Comment by Githook User [ 15/Nov/23 ] | ||||||
|
Author: {'name': 'Ryan Berryhill', 'email': 'ryan.berryhill@mongodb.com', 'username': 'ryanberryhill'}Message: | ||||||
| Comment by Billy Donahue [ 08/Nov/23 ] | ||||||
|
New plan after Zoom discussion. We're already doing a check on a global.
I think the problem is perhaps that this quickExit call needs to be more immediate death. Regarding _lk: By the time this body runs, the _lk has been initialized, but with a defer_lock meaning that it doesn't lock it. It just unlocks it later in the destructor. You have to get to the _lk.lock() statement for that. That means nothing happens to the mutex really in that _lk initializer. The _lk object is just remembering which mutex to unlock later if necessary. So yeah the problem with this whole handler is probably the quickExit trying to do stuff. So I'm thinking the quickExit needs to go but we're otherwise ok. |