[SERVER-64243] Fix segmentation fault in gperftools during backtrace Created: 07/Mar/22  Updated: 07/Dec/23

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jordi Olivares Provencio Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Assigned Teams:
Service Arch
Operating System: ALL
Sprint: Dev Platform 2022-05-16, Service Arch 2023-11-13, Service Arch 2023-11-27
Participants:
Linked BF Score: 0

 Description   

gperftools seems to have an issue during stack frame walking where sometimes it will attempt to dereference an invalid address due to not having frame pointers.

During discussion of the issue we identified some possible resolutions for this problem:

  • We could make a decision to stop using gperftools debug allocation mode. But we internally seem to be using it.
  • We could work with upstream gperftools to make forward progress on a patch that seems to fix this but has been pending to be merged since 2017.
  • We could work with upstream gperftools and/or libunwind to fix the performance issue so that we can let debugallocation use libunwind, which doesn't need frame pointers.


 Comments   
Comment by Jordi Olivares Provencio [ 03/Nov/23 ]

CC blake.oler@mongodb.com

Comment by Andrew Morrow (Inactive) [ 04/May/22 ]

Well, I tried the happy path here of seeing if the newer libunwind didn't introduce the egregious performance slowdowns we observed in the past when using it as the stack capture mechanism for tcmalloc's debugallocation infrastructure, but unfortunately the results were not good, with lots of task timeouts due to significantly increased test runtime. So I think that approach will not work.

Comment by Jordi Olivares Provencio [ 07/Mar/22 ]

acm

One way we could fix this in our codebase is to essentially copy and modify the generic backtrace implementation with a call to libunwind's backtrace implementation at the cost of not knowing the stack frame size. This would quite possibly not be able to get merged upstream due to removing features from the current implementation though.

Generated at Thu Feb 08 05:59:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.