[SERVER-33125] Segfault caused by apparent error in codegen for exception unwinding on s390x Created: 05/Feb/18 Updated: 08/Jan/24 Resolved: 09/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 3.7.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Justin Seyster | Assignee: | Justin Seyster |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Operating System: | ALL | ||||||||||||||||
| Participants: | |||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||
| Description |
|
On s390x, when this call to parseElement results in an exception, exception unwinding triggers a call to the FieldPath destructor, but that destructor gets called on address 0x0. Inspection of the disassembly reveals that the exception unwinding code is incorrectly attempting to read the address of the FieldPath object from a volatile (caller-save) register, which has been clobbered with a 0 value. As a workaround, we can assign the FieldPath to a named variable, instead of constructing it within the argument list to ProjectTypeParser::parseElement(). |
| Comments |
| Comment by Githook User [ 17/Nov/18 ] | ||||||||||||||||||||
|
Author: {'name': 'Andrew Morrow', 'email': 'acm@mongodb.com', 'username': 'acmorrow'}Message: This reverts commit 85b39d411987431a8c37b1de267b167a384ea9b3. | ||||||||||||||||||||
| Comment by Githook User [ 09/Feb/18 ] | ||||||||||||||||||||
|
Author: {'email': 'justin.seyster@mongodb.com', 'name': 'Justin Seyster', 'username': 'jseyster'}Message: | ||||||||||||||||||||
| Comment by Justin Seyster [ 07/Feb/18 ] | ||||||||||||||||||||
|
While investigating the workaround for this crasher, I realized I made a mistake in my initial diagnosis. I misread the standard, and the register I thought is volatile (r7) is actually saved (callee-save). That means that the code that the ~FieldPath call site is actually correct in expecting r7 to still hold the address to the FieldPath it wants to destroy. I observed where on the stack ProjectTypeParser::parseElememt() saved the value of r7 and set a watchpoint, which allowed me to confirm that the value is actually getting clobbered on the stack. That of course opens up the possibility that the clobbering is because of a code error, but whatever is doing the clobbering is part of the exception unwinding machinery. Unfortunately, I don't know nearly enough about exception unwinding to understand what went wrong. I still suspect a codegen error, because the crash only occurs on s390x, ASAN does not find any memory corruption, and seemingly insignificant code changes cause the crash to go away. For reference, here is the top of the backtrace for the write that clobbers the saved value for r7:
|