-
Type: Bug
-
Resolution: Fixed
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
None
Problem Statement/Rationale
I have a 3 node MongoDB 4.4 replica set running in Kubernetes and a number of PHP CLI pods connecting and querying the replica set in response to events received from other sources. Every now and then, the PHP process becomes unresponsive until killed manually. It seems to be triggered by a restart of a MongoDB pod, although it doesn't happen every time and seemingly to random PHP pods.
Steps to Reproduce
This is difficult to reproduce. It seems like a timing issue between PHP running a simple find() query and a MongoDB pod restarting.
Expected Results
Automatic recovery, retry other node, an exception, a crash, anything but a hang.
Actual Results
PHP hangs indefinitely. I've let one hanging PHP process running to see if it would ever recover, which never happened.
Additional Notes
I've managed to get GDB connected to a hanging process and produce a core dump. Although the official PHP container images is stripped of all debug information, I was able to produce the following backtrace from the core dump.
#0 __lll_lock_wait (futex=futex@entry=0x555c8f7422d8, private=0) at lowlevellock.c:52 #1 0x00007f33e3436843 in __GI___pthread_mutex_lock (mutex=0x555c8f7422d8) at ../nptl/pthread_mutex_lock.c:80 #2 0x00007f33df9138d7 in _collect_key_uuid_from_FLE2InsertUpdatePayload (ctx=0x555c8f7422d8, in=0x80, status=0x2) at /tmp/pear/temp/mongodb/src/libmongocrypt/src/mongocrypt-ctx-decrypt.c:684 #3 0x00007f33df913f5c in _collect_K_KeyID_from_FLE2IndexedEncryptedValue (status=0x7f33df91925c <_finalize+1580>, in=0xffffffc894e7437e, ctx=0x7ffddd363690) at /tmp/pear/temp/mongodb/src/libmongocrypt/src/mongocrypt-ctx-decrypt.c:553 #4 _collect_K_KeyIDs (ctx=0x7ffddd363690, in=0xffffffc894e7437e, status=0x7f33df91925c <_finalize+1580>) at /tmp/pear/temp/mongodb/src/libmongocrypt/src/mongocrypt-ctx-decrypt.c:584 #5 0x00007f33e0816301 in ?? () #6 0x0000555c8ea4a101 in ?? () #7 0x0000555c8f8a33a0 in ?? () #8 0x0000000042217db8 in ?? () #9 0x0000555c00000000 in ?? () #10 0x0000555c8f94e120 in ?? () #11 0x0000555c8f87ea20 in ?? () #12 0x0000555c8ea4a118 in ?? () #13 0xac517e6462ca8400 in ?? () #14 0x0000000000000000 in ?? ()
The backtrace was produced in a PHP container running:
PHP 8.0.28 (cli) (built: May 3 2023 06:25:45) ( NTS )
MongoDB extension version => 1.15.0
libmongoc bundled version => 1.23.1
libmongocrypt bundled version => 1.5.2
The hang is also observed in a PHP container running:
PHP 8.0.29 (cli) (built: Jul 4 2023 15:49:26) ( NTS )
MongoDB extension version => 1.16.1
libmongoc bundled version => 1.24.1
libmongocrypt bundled version => 1.8.1
All connection string config options (timeouts, readpreference etc.) are left at their defaults, only the replica set and authsource is specified.
- depends on
-
CDRIVER-4666 Deadlock due to recursive lock of non-recursive mutex in topology
- Closed