[SERVER-33259] Include libunwind in src/third_party Created: 11/Feb/18  Updated: 29/Oct/23  Resolved: 15/Jul/19

Status: Closed
Project: Core Server
Component/s: Internal Code
Affects Version/s: None
Fix Version/s: 4.3.1

Type: Improvement Priority: Major - P3
Reporter: David Bartley Assignee: Billy Donahue
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-36178 Evaluate adding -fasynchronous-unwind... Closed
is related to SERVER-33261 jstests/core/views/views_all_commands... Closed
Backwards Compatibility: Fully Compatible
Sprint: Dev Tools 2019-04-08, Dev Tools 2019-05-06, Dev Tools 2019-05-20, Dev Tools 2019-04-22, Dev Tools 2019-06-03, Dev Tools 2019-06-17, Dev Tools 2019-07-01, Dev Tools 2019-07-15
Participants:

 Description   

Per https://bugs.launchpad.net/ubuntu/+source/libunwind/+bug/1748597 and https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=827015, the libunwind that ships with many systems is likely to be broken (the mentioned libunwind bug has not been fixed in any release of Ubuntu). Part of the problem is that the upstream libunwind hasn't released a version that includes these fixes, despite the core fix being merged a year ago

This impacts the ability to enable the CPU profiler in mongod; enabling --use-cpu-profiler causes MongoDB tests to randomly fail in 3.4 and up (this also impacts 3.2 and lower, but 3.4 made its tests more robust by detecting if mongod exits non-zero).

It'd probably be best if MongoDB shipped its own vendored copy of libunwind to mitigate this (with http://git.savannah.nongnu.org/cgit/libunwind.git/commit/?id=29483327bebaf6e0141a9bee8bb99552a63f1583 and http://git.savannah.nongnu.org/cgit/libunwind.git/commit/?id=4dea379ad982e946ee2ec561c7554faf34807b72 included).



 Comments   
Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-33259 add libunwind to third_party

Comment by Billy Donahue [ 15/Jul/19 ]

Again, with 2 patches.

1) Fix clang-format IF_CONSTEXPR problem.
2) Work around scons ninja module ASPP problem.

https://mongodbcr.appspot.com/480630010

https://evergreen.mongodb.com/version/5d2cb7e2e3c33125ddb83d5a

Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: Revert "SERVER-33259 add libunwind to third_party"

This reverts commit d6bd2c5885215c29d723f02d8607f2c6d662aacc.
Branch: master
https://github.com/mongodb/mongo/commit/f8c69b361381a396f81c443438436e99c5af4970

Comment by Githook User [ 15/Jul/19 ]

Author:

{'name': 'Billy Donahue', 'username': 'BillyDonahue', 'email': 'billy.donahue@mongodb.com'}

Message: SERVER-33259 add libunwind to third_party
Branch: master
https://github.com/mongodb/mongo/commit/d6bd2c5885215c29d723f02d8607f2c6d662aacc

Comment by A. Jesse Jiryu Davis [ 08/Jun/19 ]

Some research findings:

If we have libunwind can/do we get better "deleter stacks" out of gperftools?

No, see above.

Can we make it easier to build with the cpu-profiler?

I've postponed this research.

What is the interaction with and benefits, etc to -fasynchronous-unwind-tables (SERVER-36178)?

It works and I see no downside, "bloaty" shows no change in segment sizes, the upside is supposedly more reliable stacktraces from signal handlers (which may matter for SERVER-33445)

Can we use it directly for our own backtracing and greatly simplify stacktrace_posix.cpp?

We can use it. It's not simpler, about equal. It does have the advantage of symbolizing static and _attribute_((visibility("hidden"))) functions, which our existing backtraces only show as addresses.

Does using libunwind for our own backtracing mean that we can stop building with -rdynamic? Allow us to reclaim the frame pointer?

I think we can stop building with -rdynamic and start using -fomit-frame-pointer already, whether or not we switch to libunwind, I saw no effect on backtrace symbolification either way.

Allow building with -fvisibility-inlines-hidden and later -fvisibility=hidden, but still get correct backtraces?

libunwind permits -fvisibility=hidden: with a statically linked mongod, libunwind still shows function names, but today's backtrace code doesn't. I don't know about -fvisibility-inlines-hidden, I couldn't find a scenario where it makes any difference.

 

Comment by A. Jesse Jiryu Davis [ 06/Jun/19 ]

I've opened https://github.com/gperftools/gperftools/issues/1119 to ask to use libunwind for deleter stacks instead of pprof.

Comment by A. Jesse Jiryu Davis [ 31/May/19 ]

I think we cannot easily use libunwind to get symbolicated deleter stacks from gperftools. Although gperftools uses libunwind for some profiling features, it does not use libunwind for memory debugging.

Details: In the BF-12615 scenario, gperftools logged from CheckForCorruptedBuffer, which calls SymbolTable::Symbolize(), which tries to spawn the "pprof" executable. That failed and logged "Cannot find 'pprof'". There's no option to make gperftools use libunwind for this particular feature.

Comment by A. Jesse Jiryu Davis [ 30/May/19 ]

Incidentally, libunwind doesn't support Windows, which is OK because the Windows-provided unwinding functions are already convenient.

Comment by Billy Donahue [ 24/Apr/19 ]

Questions for libunwind project (from email thread w/acm).

  • If yes to the above item, does using libunwind for our own backtracing mean that we can:

Those last three all connect to https://jira.mongodb.org/browse/PM-1328

  • Finishing the work to generate a stack dump of all threads on a signal without invoking the debugger: https://jira.mongodb.org/browse/PM-1323. On the other hand, perhaps eBPF and other things give us this a different way?

Regarding the exceptions thing, I say we speculatively go forward with it turned on, but reach out on the mailing lists for clarification.

Comment by Billy Donahue [ 04/Apr/19 ]

I think the unit test failure is due to Ubuntu's installation of the "apport" crash report uploader.
The apport tool puts the core dump somewhere unexpected, causing unwind's unit test to fail.
They'll have to work around it.

Comment by Billy Donahue [ 03/Apr/19 ]

Sent. For reference:
http://lists.nongnu.org/archive/html/libunwind-devel/2019-04/msg00000.html

Comment by Andrew Morrow (Inactive) [ 03/Apr/19 ]

You should let them know, since it looks like they are still in RC.

Comment by Billy Donahue [ 03/Apr/19 ]

ooh, libunwind at head as of this writing fails its "make check" on my ubuntu18.04 machine, using toolchain v3 gcc.

The Segmentation fault is not the error. That's a normal part of the test, generating a core to unwind. But it fails to unwind that core's stack.

=============================================
   libunwind 1.4-rc1: tests/test-suite.log
=============================================
 
# TOTAL: 37
# PASS:  36
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0
 
.. contents:: :depth: 2
 
FAIL: run-coredump-unwind-mdi
=============================
 
Segmentation fault (core dumped)
FAILURE: procedure names are missing/incorrect
FAIL run-coredump-unwind-mdi (exit status: 255)
 

Comment by Billy Donahue [ 02/Apr/19 ]

libunwind can be used for more than cpu-profiler.
Tcmalloc can use it for heap profiling. Maybe that tips the scale toward bringing it in.

Comment by Gabriel Russell (Inactive) [ 13/Feb/18 ]

bartle

Thank you for this suggestion.

It is assumed that if you want to build with --use-cpu-profiler you'll install a working version yourself. Since building with the cpu profiler is a developer only build mode, and most developers will generally figure out pretty quickly that they need to use a working libunwind, we really don't want to bloat the repository with a only sometimes used third party package that we'd have to keep updated and partially support, when it's easy enough to install it from source.

I think that the best overall solution would be for package maintainers to stop releasing libunwind, and make everyone install it themselves forever.

Again, thanks for the suggestion.

Gabriel

Generated at Thu Feb 08 04:32:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.