Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19692

Mongod failed to open connection, remained in hung state, when running WT with LSM

    • Storage Execution
    • ALL

      The powercycle test was applied to WiredTiger with LSM. After several loops of start/crash/start, the connection was not made available, with mongod still active.

      Attached gdb session has the following backtrace for all threads:

      (gdb) thread apply all bt
      
      Thread 24 (Thread 0x7f2b4c35b700 (LWP 4575)):
      #0  0x00007f2b4c9fa0d1 in do_sigwait (sig=0x7f2b4c35a8fc, set=<optimized out>)
          at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:60
      #1  __sigwait (set=0x207ef20 <mongo::(anonymous namespace)::asyncSignals>, sig=0x7f2b4c35a8fc)
          at ../nptl/sysdeps/unix/sysv/linux/../../../../../sysdeps/unix/sysv/linux/sigwait.c:97
      #2  0x00000000011844e6 in mongo::(anonymous namespace)::signalProcessingThread() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4c35b700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 23 (Thread 0x7f2b4bb5a700 (LWP 4576)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001801ec6 in __evict_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4bb5a700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 22 (Thread 0x7f2b4b359700 (LWP 4577)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e5eb4 in __sweep_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4b359700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 21 (Thread 0x7f2b4ab58700 (LWP 4578)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e24e9 in __log_file_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4ab58700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 20 (Thread 0x7f2b4a357700 (LWP 4579)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e3204 in __log_wrlsn_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b4a357700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      ---Type <return> to continue, or q <return> to quit---
      
      Thread 19 (Thread 0x7f2b49b56700 (LWP 4580)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e29b0 in __log_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b49b56700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 18 (Thread 0x7f2b49355700 (LWP 4581)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x00000000017e0476 in __ckpt_server ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b49355700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 17 (Thread 0x7f2b48b54700 (LWP 4582)):
      #0  0x00007f2b4c716da3 in select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000181e932 in __wt_sleep ()
      #2  0x0000000001810451 in __lsm_worker_manager ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b48b54700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 16 (Thread 0x7f2b48353700 (LWP 4583)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b48353700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 15 (Thread 0x7f2b47b52700 (LWP 4584)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b47b52700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 14 (Thread 0x7f2b47351700 (LWP 4585)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
      ---Type <return> to continue, or q <return> to quit---
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x000000000181d3e6 in __wt_cond_wait ()
      #2  0x0000000001817afb in __lsm_worker ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b47351700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 13 (Thread 0x7f2b46b50700 (LWP 4586)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000ab6dd9 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x0000000000aba7cf in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x0000000000ab0d44 in mongo::Lock::GlobalLock::_lock(mongo::LockMode, unsigned int) ()
      #4  0x0000000000ab0d88 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
      #5  0x0000000000ab0e06 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
      #6  0x0000000000ac5e40 in mongo::AutoGetDb::AutoGetDb(mongo::OperationContext*, mongo::StringData, mongo::LockMode) ()
      #7  0x0000000000f3e958 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::run() ()
      #8  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #9  0x00000000018be510 in execute_native_thread_routine ()
      #10 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b46b50700) at pthread_create.c:312
      #11 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 12 (Thread 0x7f2b30c04700 (LWP 4587)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x00000000010f8218 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b30c04700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 11 (Thread 0x7f2b30403700 (LWP 4588)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000d194b3 in mongo::RangeDeleter::doWork() ()
      #2  0x00000000018be510 in execute_native_thread_routine ()
      #3  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b30403700) at pthread_create.c:312
      #4  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      ---Type <return> to continue, or q <return> to quit---
      
      Thread 10 (Thread 0x7f2b2fc02700 (LWP 4589)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x0000000001139d0b in mongo::Listener::waitUntilListening() const ()
      #3  0x0000000000d570e8 in mongo::repl::isSelf(mongo::HostAndPort const&) ()
      #4  0x0000000000da8946 in mongo::repl::(anonymous namespace)::findSelfInConfig(mongo::repl::ReplicationCoordinatorExternalState*, mongo::repl::ReplicaSetConfig const&) ()
      #5  0x0000000000da965e in mongo::repl::validateConfigForStartUp(mongo::repl::ReplicationCoordinatorExternalState*, mongo::repl::ReplicaSetConfig const&, mongo::repl::ReplicaSetConfig const&) ()
      #6  0x0000000000dc6b08 in mongo::repl::ReplicationCoordinatorImpl::_finishLoadLocalConfig(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::repl::ReplicaSetConfig const&, mongo::StatusWith<mongo::repl::OpTime> const&) ()
      #7  0x0000000000dd80b9 in mongo::repl::(anonymous namespace)::callNoExcept(std::function<void ()> const&) ()
      #8  0x0000000000ddd230 in mongo::repl::ReplicationExecutor::run() ()
      #9  0x00000000018be510 in execute_native_thread_routine ()
      #10 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2fc02700) at pthread_create.c:312
      #11 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 9 (Thread 0x7f2b2f401700 (LWP 4590)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2f401700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 8 (Thread 0x7f2b2ec00700 (LWP 4591)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x0000000000f693d3 in mongo::executor::NetworkInterfaceImpl::_processAlarms() ()
      #3  0x000000000112af90 in mongo::ThreadPool::_doOneTask(std::unique_lock<std::mutex>*) ()
      #4  0x000000000112bb79 in mongo::ThreadPool::_consumeTasks() ()
      #5  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #6  0x00000000018be510 in execute_native_thread_routine ()
      ---Type <return> to continue, or q <return> to quit---
      #7  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2ec00700) at pthread_create.c:312
      #8  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 7 (Thread 0x7f2b2e3ff700 (LWP 4592)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2e3ff700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 6 (Thread 0x7f2b2dbfe700 (LWP 4593)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2dbfe700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 5 (Thread 0x7f2b2d3fd700 (LWP 4594)):
      #0  pthread_cond_wait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
      #1  0x00000000018bdd6c in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
      #2  0x000000000112bf3e in mongo::ThreadPool::_consumeTasks() ()
      #3  0x000000000112c700 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2d3fd700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 4 (Thread 0x7f2b2cbfc700 (LWP 4595)):
      #0  0x00007f2b4c9f9b9d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000118ff15 in mongo::sleepsecs(int) ()
      #2  0x0000000000f52a5b in mongo::TTLMonitor::run() ()
      #3  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #4  0x00000000018be510 in execute_native_thread_routine ()
      ---Type <return> to continue, or q <return> to quit---
      #5  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2cbfc700) at pthread_create.c:312
      #6  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 3 (Thread 0x7f2b2c3fb700 (LWP 4596)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x0000000000ab6dd9 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
      #2  0x0000000000aba7cf in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
      #3  0x0000000000ab0d44 in mongo::Lock::GlobalLock::_lock(mongo::LockMode, unsigned int) ()
      #4  0x0000000000ab0d88 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
      #5  0x0000000000ab0e06 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
      #6  0x0000000000ac5e40 in mongo::AutoGetDb::AutoGetDb(mongo::OperationContext*, mongo::StringData, mongo::LockMode) ()
      #7  0x0000000000ac619e in mongo::AutoGetCollectionForRead::AutoGetCollectionForRead(mongo::OperationContext*, std::string const&) ()
      #8  0x00000000009f23d8 in mongo::GlobalCursorIdCache::timeoutCursors(mongo::OperationContext*, int) ()
      #9  0x0000000000a1072e in mongo::ClientCursorMonitor::run() ()
      #10 0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #11 0x00000000018be510 in execute_native_thread_routine ()
      #12 0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2c3fb700) at pthread_create.c:312
      #13 0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 2 (Thread 0x7f2b2bbfa700 (LWP 4597)):
      #0  pthread_cond_timedwait@@GLIBC_2.3.2 ()
          at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      #1  0x00000000011256fe in mongo::(anonymous namespace)::PeriodicTaskRunner::run() ()
      #2  0x0000000001124c47 in mongo::BackgroundJob::jobBody() ()
      #3  0x00000000018be510 in execute_native_thread_routine ()
      #4  0x00007f2b4c9f2182 in start_thread (arg=0x7f2b2bbfa700) at pthread_create.c:312
      #5  0x00007f2b4c71f47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
      
      Thread 1 (Thread 0x7f2b4da3ecc0 (LWP 4574)):
      #0  0x00007f2b4c716da3 in select () at ../sysdeps/unix/syscall-template.S:81
      #1  0x000000000181e932 in __wt_sleep ()
      #2  0x000000000180b262 in __wt_clsm_await_switch ()
      #3  0x000000000180b760 in __clsm_enter ()
      #4  0x000000000180d09a in __clsm_insert ()
      #5  0x0000000000f2d70c in mongo::WiredTigerIndexUnique::_insert(__wt_cursor*, mongo::BSONObj cons---Type <return> to continue, or q <return> to quit---
      t&, mongo::RecordId const&, bool) ()
      #6  0x0000000000f2de14 in mongo::WiredTigerIndex::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&, bool) ()
      #7  0x0000000000b94924 in mongo::IndexAccessMethod::insert(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&, mongo::InsertDeleteOptions const&, long*) ()
      #8  0x00000000009ff74d in mongo::IndexCatalog::_indexRecord(mongo::OperationContext*, mongo::IndexCatalogEntry*, mongo::BSONObj const&, mongo::RecordId const&) ()
      #9  0x00000000009ffb46 in mongo::IndexCatalog::indexRecord(mongo::OperationContext*, mongo::BSONObj const&, mongo::RecordId const&) ()
      #10 0x00000000009e38ff in mongo::Collection::_insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool) ()
      #11 0x00000000009e53ab in mongo::Collection::insertDocument(mongo::OperationContext*, mongo::BSONObj const&, bool, bool) ()
      #12 0x00000000008b8b5e in mongo::logStartup() ()
      #13 0x00000000008baa56 in mongo::initAndListen(int) ()
      #14 0x00000000008be0f4 in main ()
      

        1. mongod-wiredTiger.log
          7.35 MB
        2. mongod-wiredTiger-recovery.log
          2.39 MB
        3. powertest.sh
          35 kB
        4. pttest.log
          20 kB
        5. wiredTiger.tar.1
          50.00 MB
        6. wiredTiger.tar.2
          50.00 MB
        7. wiredTiger.tar.3
          50.00 MB
        8. wiredTiger.tar.4
          50.00 MB
        9. wiredTiger.tar.5
          39.50 MB

            Assignee:
            backlog-server-execution [DO NOT USE] Backlog - Storage Execution Team
            Reporter:
            jonathan.abrahams Jonathan Abrahams
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: