Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Critical - P2
Fix Version/s: 8.1.0-rc0, 8.0.5
Affects Version/s: 6.0.16, 7.3.3, 7.0.12, 5.0.28, 8.0.0-rc15
Component/s: None
Labels:
None

Assigned Teams:

Server Programmability
Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v8.0, v7.0, v6.0
Sprint:
Programmability 2024-12-23, Programmability 2025-01-06, Programmability 2025-01-20
Case:
Linked BF Score:
200
Confidence Status:
None
Work Order:
0
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

libunwind calls dl_iterate_phdr here. That call is actually to a wrapper that tries to call the version from libc. Nonetheless, dl_iterate_phdr is ultimately called. dl_iterate_phdr internally takes a lock, so if we're already running it when we try to take a stack trace from every thread, we'll deadlock. This can happen, when RSTL lock acquisition fails and we try to print stack traces for all threads. However, because we're killing a bunch of operations at the time, many threads are processing unhandled exceptions with stacktraces like this:

#8 <signal handler called>
#9 0x0000ffff8af338b4 in __lll_lock_wait () from /lib64/libpthread.so.0
#10 0x0000ffff8af2bed4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#11 0x0000ffff8aeb4af4 in dl_iterate_phdr () from /lib64/libc.so.6
#12 0x0000ffff8af67a18 in _Unwind_Find_FDE () from /lib64/libgcc_s.so.1
#13 0x0000ffff8af642f4 in ?? () from /lib64/libgcc_s.so.1
#14 0x0000ffff8af652c0 in ?? () from /lib64/libgcc_s.so.1
#15 0x0000ffff8af65840 in _Unwind_RaiseException () from /lib64/libgcc_s.so.1
#16 0x0000aaaac67cdaf4 in __cxa_throw ()
#17 0x0000aaaac376f718 in mongo::error_details::throwExceptionForStatus(mongo::Status const&) ()
#18 0x0000aaaac3775048 in mongo::iassertFailed(mongo::Status const&, mongo::SourceLocation) ()
#19 0x0000aaaac36dd94c in _ZZZN5mongo13Interruptible32waitForConditionOrInterruptUntilISt11unique_lockISt5mutexEZNS_14future_details15SharedStateBase4waitEPS0_EUlvE_EEbRNS_4stdx18condition_variableERT_NS_6Date_tET0_ENKUlNS_6StatusENS0_9WakeSpeedEE_clESG_SH_ENKUlvE_clEv.isra.268 ()

When libunwind calls dl_iterate_phdr, we have a deadlock.

To be clear, this issue could happen anywhere we try to print stack traces from a signal handler, it's just particularly likely to occur in this circumstance due to the combination of killing lots of operations and printing stacktraces for all threads.

This commit in libunwind introduced a mechanism to substitute a custom implementation for dl_iterate_phdr. If we got up to date with that, we could potentially cache the results from dl_iterate_phdr (like we do here) and feed them to libunwind via that mechanism.

depends on

SERVER-98185 upgrade "nongnu" libunwind to v1.8.1

Closed

is depended on by

SERVER-97887 Enable SIGUSR2 stress testing in config_fuzzer_stress tasks

Blocked

SERVER-91012 Recommit SERVER-71520

Closed

SERVER-92548 Add a command to make mongostream dump thread stack traces.

Closed

is related to

SERVER-83271 Make synchronous signal handlers signal-safe

Open

related to

SERVER-104543 Don't fail stacktrace collection if we're unable to resolve a symbol

Closed

SERVER-90777 Revert SERVER-71520

Closed

SERVER-93337 Avoid dumping thread stacks on timeout in ThroughputProbing

Closed

SERVER-93365 Indicate that printAllThreadsStacksBlocking should not be used until deadlock scenario is fixed

Closed

SERVER-99080 Complete TODO listed in SERVER-90775

Closed

SERVER-95489 Explore if we can use curOp to log all active operations if we fail to acquire the RSTL during step up/step down

Closed

(6 related to)

Assignee:: Billy Donahue
Reporter:: Ryan Berryhill
Participants:: Billy Donahue, Githook User, Ryan Berryhill
Votes:: 3 Vote for this issue
Watchers:: 31 Start watching this issue

Created:: May 22 2024 05:19:56 PM UTC
Updated:: May 19 2025 07:10:41 PM UTC
Resolved:: Jan 07 2025 10:49:57 PM UTC
Confidence Status Last Update:: 20/Dec/24 10:45 AM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates