[SERVER-72334] Improve diagnostic logging for SIGSEGV, SIGBUS, SIGILL, and SIGFPE Created: 21/Dec/22  Updated: 29/Oct/23  Resolved: 05/Sep/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Vojislav Stojkovic Assignee: Vojislav Stojkovic
Resolution: Fixed Votes: 1
Labels: auto-reverted
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Problem/Incident
Assigned Teams:
Service Arch
Backwards Compatibility: Fully Compatible
Sprint: Service Arch 2023-01-23, Service Arch 2023-02-06, Service Arch 2023-05-15, Service Arch 2023-07-24, Service Arch 2023-08-07, Service Arch 2023-08-21, Service Arch 2023-09-04, Service Arch 2023-09-18
Participants:
Linked BF Score: 5

 Description   

Currently, when we receive SIGSEGV, SIGBUS, SIGILL, or SIGFPE, we only log the following information:

  • whether the the signal was an invalid access or invalid operation
  • what address triggered the signal (i.e. the si_addr field in siginfo_t we receive)
  • the signal number
  • the backtrace

While this information is often enough, there are cases where we need more diagnostic information than what we log right now, information which is available in our signal handler, but discarded by it.

Furthermore, because our signal handler ends up calling raise to terminate the process, the contents of siginfo_t that get written to the core dump don't correspond to the siginfo_t we receive in our signal handler.

At the very minimum we should:

  • log the contents of si_code
  • emit an informative message that explains that the contents of $_siginfo in GDB don't represent the information that was received in the signal handler

Further possible improvements are to log:

  • the entire contents of siginfo_t as a hex blob
  • architecture-specific information accessible via ucontext_t
    • on x86_64, for segfaults, we can emit the contents of uc_mcontext.gregs[REG_ERR] to help distinguish whether the fault was caused by:
      • a read or a write access
      • a kernel-mode or a user-mode access
      • an instruction fetch or not


 Comments   
Comment by Githook User [ 05/Sep/23 ]

Author:

{'name': 'Vojislav Stojkovic', 'email': 'vojislav.stojkovic@mongodb.com', 'username': 'vstojkovic-mongodb'}

Message: SERVER-72334 Improve diagnostic logging for SIGSEGV, SIGBUS, SIGILL, and SIGFPE
Branch: master
https://github.com/mongodb/mongo/commit/67c56e96e5c1203b17b573dd2c9e7e0b9dc68aa0

Comment by Billy Donahue [ 13/Aug/23 ]

Oops there's no SIG_TKILL on macOS

Comment by xgen-buildbaron-user [ 12/Aug/23 ]

Ticket re-opened due to revert. compile_unittests began a consistent failure of compile_unittests

Comment by Githook User [ 12/Aug/23 ]

Author:

{'name': 'auto-revert-processor', 'email': 'dev-prod-dag@mongodb.com', 'username': ''}

Message: Revert "SERVER-72334 Improve diagnostic logging for SIGSEGV, SIGBUS, SIGILL, and SIGFPE"

This reverts commit 2b7c62ee73f7bea6800cb5bc91e6de51af34df75.
Branch: master
https://github.com/mongodb/mongo/commit/af78b36812c3ef93a29c1c40e6197ad71e71af92

Comment by Githook User [ 11/Aug/23 ]

Author:

{'name': 'Vojislav Stojkovic', 'email': 'vojislav.stojkovic@mongodb.com', 'username': 'vstojkovic-mongodb'}

Message: SERVER-72334 Improve diagnostic logging for SIGSEGV, SIGBUS, SIGILL, and SIGFPE
Branch: master
https://github.com/mongodb/mongo/commit/2b7c62ee73f7bea6800cb5bc91e6de51af34df75

Comment by Vojislav Stojkovic [ 21/Dec/22 ]

More info on extracting and interpreting x86_64-specific information: https://stackoverflow.com/questions/17671869/how-to-identify-read-or-write-operations-of-page-fault-when-using-sigaction-hand

Generated at Thu Feb 08 06:21:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.