[SERVER-5542] Race during static destruction of CommitJob object Created: 09/Apr/12  Updated: 11/Jul/16  Resolved: 10/Apr/12

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.0.4
Fix Version/s: 2.0.5, 2.1.0

Type: Bug Priority: Major - P3
Reporter: Ben Becker Assignee: Mathias Stearn
Resolution: Done Votes: 0
Labels: journal, shutdown
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 11.04 x86_64 (dev build)


Issue Links:
Related
is related to SERVER-5566 abruptQuit should use _exit Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Participants:

 Description   

The global CommitJob object is destructed on ::exit() before thread execution is halted. This affects the v2.0 branch; master works around this by using a static reference.

To reproduce, send SIGQUIT to the mongod process while flooding a single connection with fire-and-forget writes:

> for (i=0;i<100000;i++) { db.test.insert({ a:1 }); }

The resulting crash varies. In the one below, the initAndListen thread receives the SIGQUIT. This thread then ::exit()s, starts destructing the mongo::dur::CommitJob static, generates a SIGSEGV when destructing a vector of Writes (from a TaskQueue<Writes>), then receives it's own SIGSEGV. The durability thread then fails an assertion on pthread_mutex_lock() while in mongo::dur::CommitJob::notifyCommitted()/NotifyAll(). Finally, an execCommand thread calls terminate() due to 'pure virtual method called', invokes myterminate, and gets a SIGABRT. At this point some threads continue to run and the process stops handling signals.

^\Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
Fri Apr  6 20:02:16 Got signal: 3 (Quit).
Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
...
Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
Fri Apr  6 20:02:16 [conn3] getmore local.oplog.rs query: { ts: { $gte: new Date(5728488334133562937) } } cursorid:8337457123675887220 nreturned:7 reslen:1091 0ms
Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
Fri Apr  6 20:02:16 Backtrace:
0x8a531c 0xc6b139 0x7f142694b420 0x7f14269f1613 0x8ca81b 0x94a844 0xc635e8 0xc64ef3 0xc65246 0xc69c25 0x7f142693630d 0x82d8c9 
 ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
 ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
 /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
 /lib/x86_64-linux-gnu/libc.so.6(__select+0x33) [0x7f14269f1613]
 ./mongod(_ZN5mongo8Listener13initAndListenEv+0x2f1) [0x8ca81b]
 ./mongod(_ZN5mongo17PortMessageServer3runEv+0x1c) [0x94a844]
 ./mongod(_ZN5mongo6listenEi+0x11d) [0xc635e8]
 ./mongod(_ZN5mongo14_initAndListenEi+0x650) [0xc64ef3]
 ./mongod(_ZN5mongo13initAndListenEi+0x18) [0xc65246]
 ./mongod(main+0x4668) [0xc69c25]
 /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f142693630d]
 ./mongod() [0x82d8c9]
 
Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state
Fri Apr  6 20:02:16 [conn3] getmore local.oplog.rs query: { ts: { $gte: new Date(5728488334133562937) } } cursorid:8337457123675887220 nreturned:4 reslen:632 0ms
Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
Logstream::get called in uninitialized state
Fri Apr  6 20:02:16 [conn4] Accessing: test for the first time
Fri Apr  6 20:02:16 [initandlisten] ERROR: Client::shutdown not called: initandlisten
Fri Apr  6 20:02:16 Invalid access at address: 0x9
 
Fri Apr  6 20:02:16 Got signal: 11 (Segmentation fault).
Fri Apr  6 20:02:16 Backtrace:
0x8a531c 0xc6b139 0xc6b314 0x7f1427d07060 0x7f142698f9e6 0x7f14269902a9 0x7f14269939cc 0xa78f1c 0xa78d40 0xa7966d 0xa79516 0xa792b1 0xa7933b 0xa79409 0x7f1426950821 0x7f14269508a5 0xc6b180 0x7f142694b420 0x7f14269f1613 0x8ca81b 
 ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
 ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
 ./mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x103) [0xc6b314]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x10060) [0x7f1427d07060]
 /lib/x86_64-linux-gnu/libc.so.6(+0x7a9e6) [0x7f142698f9e6]
 /lib/x86_64-linux-gnu/libc.so.6(+0x7b2a9) [0x7f14269902a9]
 /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [0x7f14269939cc]
 ./mongod(_ZN9__gnu_cxx13new_allocatorIN5mongo3dur6Writes1DEE10deallocateEPS4_m+0x20) [0xa78f1c]
 ./mongod(_ZNSt12_Vector_baseIN5mongo3dur6Writes1DESaIS3_EE13_M_deallocateEPS3_m+0x32) [0xa78d40]
 ./mongod(_ZNSt12_Vector_baseIN5mongo3dur6Writes1DESaIS3_EED1Ev+0x45) [0xa7966d]
 ./mongod(_ZNSt6vectorIN5mongo3dur6Writes1DESaIS3_EED1Ev+0x42) [0xa79516]
 ./mongod(_ZN5mongo9TaskQueueINS_3dur6Writes1DEED1Ev+0x4b) [0xa792b1]
 ./mongod(_ZN5mongo3dur6WritesD1Ev+0x4d) [0xa7933b]
 ./mongod(_ZN5mongo3dur9CommitJobD2Ev+0x2f) [0xa79409]
 /lib/x86_64-linux-gnu/libc.so.6(+0x3b821) [0x7f1426950821]
 /lib/x86_64-linux-gnu/libc.so.6(+0x3b8a5) [0x7f14269508a5]
 ./mongod(_ZN5mongo10abruptQuitEi+0x157) [0xc6b180]
 /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
 /lib/x86_64-linux-gnu/libc.so.6(__select+0x33) [0x7f14269f1613]
 ./mongod(_ZN5mongo8Listener13initAndListenEv+0x2f1) [0x8ca81b]
 
mongod: /usr/include/boost/thread/pthread/pthread_mutex_scoped_lock.hpp:26: boost::pthread::pthread_mutex_scoped_lock::pthread_mutex_scoped_lock(pthread_mutex_t*): Assertion `!pthread_mutex_lock(m)' failed.
Fri Apr  6 20:02:16 Got signal: 6 (Aborted).
 
Fri Apr  6 20:02:16 Backtrace:
0x8a531c 0xc6b139 0x7f142694b420 0x7f142694b3a5 0x7f142694eb0b 0x7f1426943d4d 0x850629 0x89e973 0x8c0202 0xa7775e 0xa75804 0xa7592c 0xa760b0 0xa763de 0xa7949f 0x7f14275dfba9 0x7f1427cfeefc 0x7f14269f859d 
 ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
 ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
 /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f142694b3a5]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f142694eb0b]
 /lib/x86_64-linux-gnu/libc.so.6(__assert_fail+0xdd) [0x7f1426943d4d]
 ./mongod(_ZN5boost7pthread25pthread_mutex_scoped_lockC2EP15pthread_mutex_t+0x55) [0x850629]
 ./mongod(_ZN5boost22condition_variable_any10notify_allEv+0x1f) [0x89e973]
 ./mongod(_ZN5mongo9NotifyAll9notifyAllEy+0x4e) [0x8c0202]
 ./mongod(_ZN5mongo3dur9CommitJob15notifyCommittedEv+0x2a) [0xa7775e]
 ./mongod(_ZN5mongo3dur28_groupCommitWithLimitedLocksEv+0x1ae) [0xa75804]
 ./mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x10) [0xa7592c]
 ./mongod() [0xa760b0]
 ./mongod(_ZN5mongo3dur9durThreadEv+0x156) [0xa763de]
 ./mongod(_ZN5boost6detail11thread_dataIPFvvEE3runEv+0x19) [0xa7949f]
 /usr/lib/libboost_thread.so.1.46.1(thread_proxy+0x69) [0x7f14275dfba9]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
 
Logstream::get called in uninitialized state
Fri Apr  6 20:02:17 [conn2] runQuery called admin.$cmd { replSetHeartbeat: "rs1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
Logstream::get called in uninitialized state
Fri Apr  6 20:02:17 [conn2] run command admin.$cmd { replSetHeartbeat: "rs1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
pure virtual method called
Fri Apr  6 20:02:17 terminate() called, printing stack:
0x8a531c 0xc6b496 0x7f1427aa9f26 0x7f1427aa9f53 0x7f1427aaa96f 0xbc87e0 0xbc91d4 0xbaa73a 0xbabb0a 0xb34962 0xb35364 0xc6c0c5 0x949a0e 0x7f1427cfeefc 0x7f14269f859d 
 ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
 ./mongod(_ZN5mongo11myterminateEv+0x5f) [0xc6b496]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f26) [0x7f1427aa9f26]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f53) [0x7f1427aa9f53]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xba96f) [0x7f1427aaa96f]
 ./mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x630) [0xbc87e0]
 ./mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x367) [0xbc91d4]
 ./mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x46) [0xbaa73a]
 ./mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x24e) [0xbabb0a]
 ./mongod() [0xb34962]
 ./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x335) [0xb35364]
 ./mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xe1) [0xc6c0c5]
 ./mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x2b2) [0x949a0e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
Fri Apr  6 20:02:17 Got signal: 6 (Aborted).
 
Fri Apr  6 20:02:17 Backtrace:
0x8a531c 0xc6b139 0x7f142694b420 0x7f142694b3a5 0x7f142694eb0b 0xc6b49b 0x7f1427aa9f26 0x7f1427aa9f53 0x7f1427aaa96f 0xbc87e0 0xbc91d4 0xbaa73a 0xbabb0a 0xb34962 0xb35364 0xc6c0c5 0x949a0e 0x7f1427cfeefc 0x7f14269f859d 
 ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
 ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
 /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f142694b3a5]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f142694eb0b]
 ./mongod(_ZN5mongo11myterminateEv+0x64) [0xc6b49b]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f26) [0x7f1427aa9f26]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f53) [0x7f1427aa9f53]
 /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xba96f) [0x7f1427aaa96f]
 ./mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x630) [0xbc87e0]
 ./mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x367) [0xbc91d4]
 ./mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x46) [0xbaa73a]
 ./mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x24e) [0xbabb0a]
 ./mongod() [0xb34962]
 ./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x335) [0xb35364]
 ./mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xe1) [0xc6c0c5]
 ./mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x2b2) [0x949a0e]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
 
Logstream::get called in uninitialized state
Fri Apr  6 20:02:55 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
...
Fri Apr  6 20:12:55 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
Logstream::get called in uninitialized state
Fri Apr  6 20:13:03 [conn4] couldn't get lazy rwlock
Logstream::get called in uninitialized state



 Comments   
Comment by Mathias Stearn [ 10/Apr/12 ]

The multiple SEGVs not causing an exit are due at least in part to SERVER-5566

Comment by Mathias Stearn [ 10/Apr/12 ]

This was fixed in commit cd335640b633738bae9d1f15ad0e584b110e1cf2.

https://github.com/mongodb/mongo/commit/cd335640b633738bae9d1f15ad0e584b110e1cf2

Generated at Thu Feb 08 03:09:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.