[SERVER-64243] Fix segmentation fault in gperftools during backtrace Created: 07/Mar/22 Updated: 07/Dec/23 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Jordi Olivares Provencio | Assignee: | Backlog - Service Architecture |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Service Arch
|
||||
| Operating System: | ALL | ||||
| Sprint: | Dev Platform 2022-05-16, Service Arch 2023-11-13, Service Arch 2023-11-27 | ||||
| Participants: | |||||
| Linked BF Score: | 0 | ||||
| Description |
|
gperftools seems to have an issue during stack frame walking where sometimes it will attempt to dereference an invalid address due to not having frame pointers. During discussion of the issue we identified some possible resolutions for this problem:
|
| Comments |
| Comment by Jordi Olivares Provencio [ 03/Nov/23 ] |
| Comment by Andrew Morrow (Inactive) [ 04/May/22 ] |
|
Well, I tried the happy path here of seeing if the newer libunwind didn't introduce the egregious performance slowdowns we observed in the past when using it as the stack capture mechanism for tcmalloc's debugallocation infrastructure, but unfortunately the results were not good, with lots of task timeouts due to significantly increased test runtime. So I think that approach will not work. |
| Comment by Jordi Olivares Provencio [ 07/Mar/22 ] |
|
One way we could fix this in our codebase is to essentially copy and modify the generic backtrace implementation with a call to libunwind's backtrace implementation at the cost of not knowing the stack frame size. This would quite possibly not be able to get merged upstream due to removing features from the current implementation though. |