Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-5542

Race during static destruction of CommitJob object

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 2.0.4
    • Fix Version/s: 2.0.5, 2.1.0
    • Component/s: Stability
    • Labels:
    • Environment:
      Ubuntu 11.04 x86_64 (dev build)
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL

      Description

      The global CommitJob object is destructed on ::exit() before thread execution is halted. This affects the v2.0 branch; master works around this by using a static reference.

      To reproduce, send SIGQUIT to the mongod process while flooding a single connection with fire-and-forget writes:

      > for (i=0;i<100000;i++) { db.test.insert({ a:1 }); }

      The resulting crash varies. In the one below, the initAndListen thread receives the SIGQUIT. This thread then ::exit()s, starts destructing the mongo::dur::CommitJob static, generates a SIGSEGV when destructing a vector of Writes (from a TaskQueue<Writes>), then receives it's own SIGSEGV. The durability thread then fails an assertion on pthread_mutex_lock() while in mongo::dur::CommitJob::notifyCommitted()/NotifyAll(). Finally, an execCommand thread calls terminate() due to 'pure virtual method called', invokes myterminate, and gets a SIGABRT. At this point some threads continue to run and the process stops handling signals.

      ^\Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      Fri Apr  6 20:02:16 Got signal: 3 (Quit).
      Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      ...
      Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      Fri Apr  6 20:02:16 [conn3] getmore local.oplog.rs query: { ts: { $gte: new Date(5728488334133562937) } } cursorid:8337457123675887220 nreturned:7 reslen:1091 0ms
      Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      Fri Apr  6 20:02:16 Backtrace:
      0x8a531c 0xc6b139 0x7f142694b420 0x7f14269f1613 0x8ca81b 0x94a844 0xc635e8 0xc64ef3 0xc65246 0xc69c25 0x7f142693630d 0x82d8c9 
       ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
       ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
       /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
       /lib/x86_64-linux-gnu/libc.so.6(__select+0x33) [0x7f14269f1613]
       ./mongod(_ZN5mongo8Listener13initAndListenEv+0x2f1) [0x8ca81b]
       ./mongod(_ZN5mongo17PortMessageServer3runEv+0x1c) [0x94a844]
       ./mongod(_ZN5mongo6listenEi+0x11d) [0xc635e8]
       ./mongod(_ZN5mongo14_initAndListenEi+0x650) [0xc64ef3]
       ./mongod(_ZN5mongo13initAndListenEi+0x18) [0xc65246]
       ./mongod(main+0x4668) [0xc69c25]
       /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7f142693630d]
       ./mongod() [0x82d8c9]
       
      Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      Logstream::get called in uninitialized state
      Logstream::get called in uninitialized state
      Fri Apr  6 20:02:16 [conn3] getmore local.oplog.rs query: { ts: { $gte: new Date(5728488334133562937) } } cursorid:8337457123675887220 nreturned:4 reslen:632 0ms
      Fri Apr  6 20:02:16 [conn4] insert test.asdf 0ms
      Logstream::get called in uninitialized state
      Fri Apr  6 20:02:16 [conn4] Accessing: test for the first time
      Fri Apr  6 20:02:16 [initandlisten] ERROR: Client::shutdown not called: initandlisten
      Fri Apr  6 20:02:16 Invalid access at address: 0x9
       
      Fri Apr  6 20:02:16 Got signal: 11 (Segmentation fault).
      Fri Apr  6 20:02:16 Backtrace:
      0x8a531c 0xc6b139 0xc6b314 0x7f1427d07060 0x7f142698f9e6 0x7f14269902a9 0x7f14269939cc 0xa78f1c 0xa78d40 0xa7966d 0xa79516 0xa792b1 0xa7933b 0xa79409 0x7f1426950821 0x7f14269508a5 0xc6b180 0x7f142694b420 0x7f14269f1613 0x8ca81b 
       ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
       ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
       ./mongod(_ZN5mongo24abruptQuitWithAddrSignalEiP7siginfoPv+0x103) [0xc6b314]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x10060) [0x7f1427d07060]
       /lib/x86_64-linux-gnu/libc.so.6(+0x7a9e6) [0x7f142698f9e6]
       /lib/x86_64-linux-gnu/libc.so.6(+0x7b2a9) [0x7f14269902a9]
       /lib/x86_64-linux-gnu/libc.so.6(cfree+0x6c) [0x7f14269939cc]
       ./mongod(_ZN9__gnu_cxx13new_allocatorIN5mongo3dur6Writes1DEE10deallocateEPS4_m+0x20) [0xa78f1c]
       ./mongod(_ZNSt12_Vector_baseIN5mongo3dur6Writes1DESaIS3_EE13_M_deallocateEPS3_m+0x32) [0xa78d40]
       ./mongod(_ZNSt12_Vector_baseIN5mongo3dur6Writes1DESaIS3_EED1Ev+0x45) [0xa7966d]
       ./mongod(_ZNSt6vectorIN5mongo3dur6Writes1DESaIS3_EED1Ev+0x42) [0xa79516]
       ./mongod(_ZN5mongo9TaskQueueINS_3dur6Writes1DEED1Ev+0x4b) [0xa792b1]
       ./mongod(_ZN5mongo3dur6WritesD1Ev+0x4d) [0xa7933b]
       ./mongod(_ZN5mongo3dur9CommitJobD2Ev+0x2f) [0xa79409]
       /lib/x86_64-linux-gnu/libc.so.6(+0x3b821) [0x7f1426950821]
       /lib/x86_64-linux-gnu/libc.so.6(+0x3b8a5) [0x7f14269508a5]
       ./mongod(_ZN5mongo10abruptQuitEi+0x157) [0xc6b180]
       /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
       /lib/x86_64-linux-gnu/libc.so.6(__select+0x33) [0x7f14269f1613]
       ./mongod(_ZN5mongo8Listener13initAndListenEv+0x2f1) [0x8ca81b]
       
      mongod: /usr/include/boost/thread/pthread/pthread_mutex_scoped_lock.hpp:26: boost::pthread::pthread_mutex_scoped_lock::pthread_mutex_scoped_lock(pthread_mutex_t*): Assertion `!pthread_mutex_lock(m)' failed.
      Fri Apr  6 20:02:16 Got signal: 6 (Aborted).
       
      Fri Apr  6 20:02:16 Backtrace:
      0x8a531c 0xc6b139 0x7f142694b420 0x7f142694b3a5 0x7f142694eb0b 0x7f1426943d4d 0x850629 0x89e973 0x8c0202 0xa7775e 0xa75804 0xa7592c 0xa760b0 0xa763de 0xa7949f 0x7f14275dfba9 0x7f1427cfeefc 0x7f14269f859d 
       ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
       ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
       /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
       /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f142694b3a5]
       /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f142694eb0b]
       /lib/x86_64-linux-gnu/libc.so.6(__assert_fail+0xdd) [0x7f1426943d4d]
       ./mongod(_ZN5boost7pthread25pthread_mutex_scoped_lockC2EP15pthread_mutex_t+0x55) [0x850629]
       ./mongod(_ZN5boost22condition_variable_any10notify_allEv+0x1f) [0x89e973]
       ./mongod(_ZN5mongo9NotifyAll9notifyAllEy+0x4e) [0x8c0202]
       ./mongod(_ZN5mongo3dur9CommitJob15notifyCommittedEv+0x2a) [0xa7775e]
       ./mongod(_ZN5mongo3dur28_groupCommitWithLimitedLocksEv+0x1ae) [0xa75804]
       ./mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x10) [0xa7592c]
       ./mongod() [0xa760b0]
       ./mongod(_ZN5mongo3dur9durThreadEv+0x156) [0xa763de]
       ./mongod(_ZN5boost6detail11thread_dataIPFvvEE3runEv+0x19) [0xa7949f]
       /usr/lib/libboost_thread.so.1.46.1(thread_proxy+0x69) [0x7f14275dfba9]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
       
      Logstream::get called in uninitialized state
      Fri Apr  6 20:02:17 [conn2] runQuery called admin.$cmd { replSetHeartbeat: "rs1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
      Logstream::get called in uninitialized state
      Fri Apr  6 20:02:17 [conn2] run command admin.$cmd { replSetHeartbeat: "rs1", v: 1, pv: 1, checkEmpty: false, from: "localhost:27018" }
      pure virtual method called
      Fri Apr  6 20:02:17 terminate() called, printing stack:
      0x8a531c 0xc6b496 0x7f1427aa9f26 0x7f1427aa9f53 0x7f1427aaa96f 0xbc87e0 0xbc91d4 0xbaa73a 0xbabb0a 0xb34962 0xb35364 0xc6c0c5 0x949a0e 0x7f1427cfeefc 0x7f14269f859d 
       ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
       ./mongod(_ZN5mongo11myterminateEv+0x5f) [0xc6b496]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f26) [0x7f1427aa9f26]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f53) [0x7f1427aa9f53]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xba96f) [0x7f1427aaa96f]
       ./mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x630) [0xbc87e0]
       ./mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x367) [0xbc91d4]
       ./mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x46) [0xbaa73a]
       ./mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x24e) [0xbabb0a]
       ./mongod() [0xb34962]
       ./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x335) [0xb35364]
       ./mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xe1) [0xc6c0c5]
       ./mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x2b2) [0x949a0e]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
      Fri Apr  6 20:02:17 Got signal: 6 (Aborted).
       
      Fri Apr  6 20:02:17 Backtrace:
      0x8a531c 0xc6b139 0x7f142694b420 0x7f142694b3a5 0x7f142694eb0b 0xc6b49b 0x7f1427aa9f26 0x7f1427aa9f53 0x7f1427aaa96f 0xbc87e0 0xbc91d4 0xbaa73a 0xbabb0a 0xb34962 0xb35364 0xc6c0c5 0x949a0e 0x7f1427cfeefc 0x7f14269f859d 
       ./mongod(_ZN5mongo15printStackTraceERSo+0x27) [0x8a531c]
       ./mongod(_ZN5mongo10abruptQuitEi+0x110) [0xc6b139]
       /lib/x86_64-linux-gnu/libc.so.6(+0x36420) [0x7f142694b420]
       /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f142694b3a5]
       /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f142694eb0b]
       ./mongod(_ZN5mongo11myterminateEv+0x64) [0xc6b49b]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f26) [0x7f1427aa9f26]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb9f53) [0x7f1427aa9f53]
       /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xba96f) [0x7f1427aaa96f]
       ./mongod(_ZN5mongo11execCommandEPNS_7CommandERNS_6ClientEiPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0x630) [0xbc87e0]
       ./mongod(_ZN5mongo12_runCommandsEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x367) [0xbc91d4]
       ./mongod(_ZN5mongo11runCommandsEPKcRNS_7BSONObjERNS_5CurOpERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x46) [0xbaa73a]
       ./mongod(_ZN5mongo8runQueryERNS_7MessageERNS_12QueryMessageERNS_5CurOpES1_+0x24e) [0xbabb0a]
       ./mongod() [0xb34962]
       ./mongod(_ZN5mongo16assembleResponseERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortE+0x335) [0xb35364]
       ./mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xe1) [0xc6c0c5]
       ./mongod(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x2b2) [0x949a0e]
       /lib/x86_64-linux-gnu/libpthread.so.0(+0x7efc) [0x7f1427cfeefc]
       /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f14269f859d]
       
      Logstream::get called in uninitialized state
      Fri Apr  6 20:02:55 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
      ...
      Fri Apr  6 20:12:55 [PeriodicTask::Runner] task: DBConnectionPool-cleaner took: 0ms
      Logstream::get called in uninitialized state
      Fri Apr  6 20:13:03 [conn4] couldn't get lazy rwlock
      Logstream::get called in uninitialized state

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              redbeard0531 Mathias Stearn
              Reporter:
              benjamin.becker Ben Becker
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: