[SERVER-27430] mongod hangs on stop Created: 15/Dec/16  Updated: 23/Jan/17  Resolved: 19/Jan/17

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: xihui he Assignee: Geert Bosch
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Sprint: Storage 2017-01-23
Participants:

 Description   

version: mongodb-linux-x86_64-debian81-3.2.10
os: debian 8.5

Below is the bt:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Loaded symbols for /lib/x86_64-linux-gnu/libpthread.so.0
Reading symbols from /lib/x86_64-linux-gnu/libc.so.6...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libc-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/ld-2.19.so...done.
done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Reading symbols from /lib/x86_64-linux-gnu/libnss_files.so.2...Reading symbols from /usr/lib/debug//lib/x86_64-linux-gnu/libnss_files-2.19.so...done.
done.
Loaded symbols for /lib/x86_64-linux-gnu/libnss_files.so.2
__lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
135	../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S: No such file or directory.
(gdb) bt
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f74c0c8829a in _L_lock_55 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f74c0c88211 in __lll_lock_elision (futex=0x20aad40 <mongo::shutdownLock>, adapt_count=<optimized out>, private=0)
    at ../nptl/sysdeps/unix/sysv/linux/x86/elision-lock.c:94
#3  0x0000000000cdced2 in mongo::exitCleanly(mongo::ExitCode) ()
#4  0x0000000000970274 in main ()
(gdb) thread apply bt all
(gdb) thread apply all bt
 
Thread 34 (Thread 0x7f74c0254700 (LWP 42743)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000bbb789 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
#2  0x0000000000bbf257 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
#3  0x0000000000cdd3c6 in mongo::exitCleanly(mongo::ExitCode) ()
#4  0x00000000013271e1 in mongo::(anonymous namespace)::signalProcessingThread() ()
#5  0x0000000001b50620 in execute_native_thread_routine ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74c0254700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 33 (Thread 0x7f74bfa53700 (LWP 42745)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a13b0b in __log_file_server ()
#3  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bfa53700) at pthread_create.c:309
#4  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 32 (Thread 0x7f74bf252700 (LWP 42746)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a95134 in __wt_cond_auto_wait_signal ()
#3  0x0000000001a95194 in __wt_cond_auto_wait ()
#4  0x0000000001a14519 in __log_wrlsn_server ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bf252700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 31 (Thread 0x7f74bea51700 (LWP 42747)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a95134 in __wt_cond_auto_wait_signal ()
#3  0x0000000001a134dc in __log_server ()
#4  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bea51700) at pthread_create.c:309
#5  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 30 (Thread 0x7f74be250700 (LWP 42748)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a3928e in __evict_lru_pages ()
#3  0x0000000001a3a422 in __wt_evict_thread_run ()
#4  0x0000000001a9dd66 in __wt_thread_run ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74be250700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 29 (Thread 0x7f74bda4f700 (LWP 42749)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a1716a in __sweep_server ()
#3  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bda4f700) at pthread_create.c:309
#4  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 28 (Thread 0x7f74bd24e700 (LWP 42750)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a10d69 in __ckpt_server ()
#3  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bd24e700) at pthread_create.c:309
#4  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
---Type <return> to continue, or q <return> to quit---
 
Thread 27 (Thread 0x7f74bca4d700 (LWP 42751)):
#0  0x00007f74c0985f2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f74c09ae204 in usleep (useconds=<optimized out>) at ../sysdeps/unix/sysv/linux/usleep.c:32
#2  0x0000000001332ceb in mongo::sleepmillis(long long) ()
#3  0x0000000001090803 in mongo::WiredTigerKVEngine::WiredTigerJournalFlusher::run() ()
#4  0x00000000012b2e90 in mongo::BackgroundJob::jobBody() ()
#5  0x0000000001b50620 in execute_native_thread_routine ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bca4d700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 26 (Thread 0x7f74bc24c700 (LWP 42752)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b4f1bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x000000000127b2b8 in mongo::DeadlineMonitor<mongo::mozjs::MozJSImplScope>::deadlineMonitorThread() ()
#3  0x0000000001b50620 in execute_native_thread_routine ()
#4  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bc24c700) at pthread_create.c:309
#5  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 25 (Thread 0x7f74bba4b700 (LWP 42753)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000e5cd83 in mongo::RangeDeleter::doWork() ()
#2  0x0000000001b50620 in execute_native_thread_routine ()
#3  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74bba4b700) at pthread_create.c:309
#4  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 24 (Thread 0x7f74ba248700 (LWP 42756)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000000bbb789 in mongo::CondVarLockGrantNotification::wait(unsigned int) ()
#2  0x0000000000bbf257 in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, unsigned int, bool) ()
#3  0x0000000000bb56d9 in mongo::Lock::GlobalLock::waitForLock(unsigned int) ()
#4  0x0000000000bb5705 in mongo::Lock::GlobalLock::GlobalLock(mongo::Locker*, mongo::LockMode, unsigned int) ()
#5  0x0000000000bb5786 in mongo::Lock::DBLock::DBLock(mongo::Locker*, mongo::StringData, mongo::LockMode) ()
#6  0x00000000010ae01e in mongo::TTLMonitor::getTTLIndexesForDB(mongo::OperationContext*, std::string const&, std::vector<mongo::BSONObj, std::allocator<mongo::BSONObj> >*) ()
#7  0x00000000010af8f1 in mongo::TTLMonitor::doTTLPass() ()
#8  0x00000000010affb8 in mongo::TTLMonitor::run() ()
#9  0x00000000012b2e90 in mongo::BackgroundJob::jobBody() ()
#10 0x0000000001b50620 in execute_native_thread_routine ()
#11 0x00007f74c0c7f0a4 in start_thread (arg=0x7f74ba248700) at pthread_create.c:309
#12 0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 23 (Thread 0x7f74b9a47700 (LWP 42757)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b4f1bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x00000000012b9946 in mongo::ThreadPool::_consumeTasks() ()
#3  0x00000000012ba0f0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
#4  0x0000000001b50620 in execute_native_thread_routine ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b9a47700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 22 (Thread 0x7f74b9246700 (LWP 42758)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b4f1bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x00000000012b9946 in mongo::ThreadPool::_consumeTasks() ()
#3  0x00000000012ba0f0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
#4  0x0000000001b50620 in execute_native_thread_routine ()
---Type <return> to continue, or q <return> to quit---
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b9246700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 21 (Thread 0x7f74b8244700 (LWP 42760)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b4f1bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x00000000012b9946 in mongo::ThreadPool::_consumeTasks() ()
#3  0x00000000012ba0f0 in mongo::ThreadPool::_workerThreadBody(mongo::ThreadPool*, std::string const&) ()
#4  0x0000000001b50620 in execute_native_thread_routine ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b8244700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 20 (Thread 0x7f74b7a43700 (LWP 42761)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x00000000012b395e in mongo::(anonymous namespace)::PeriodicTaskRunner::run() ()
#2  0x00000000012b2e90 in mongo::BackgroundJob::jobBody() ()
#3  0x0000000001b50620 in execute_native_thread_routine ()
#4  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b7a43700) at pthread_create.c:309
#5  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 19 (Thread 0x7f74b7242700 (LWP 42762)):
#0  0x00007f74c0985f2d in nanosleep () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f74c0985dc4 in __sleep (seconds=0) at ../sysdeps/unix/sysv/linux/sleep.c:137
#2  0x0000000001b505b9 in std::this_thread::__sleep_for(std::chrono::duration<long, std::ratio<1l, 1l> >, std::chrono::duration<long, std::ratio<1l, 1000000000l> >) ()
#3  0x0000000001332cae in mongo::sleepsecs(int) ()
#4  0x00000000012cabae in mongo::HostnameCanonicalizationWorker::_doWork() ()
#5  0x0000000001b50620 in execute_native_thread_routine ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b7242700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 18 (Thread 0x7f74b6034700 (LWP 43325)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001344cf0 in asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::error_code const&) ()
#2  0x0000000001344fa1 in asio::detail::scheduler::run(std::error_code&) ()
#3  0x000000000134913f in asio::io_service::run() ()
#4  0x000000000133c8e0 in asio_detail_posix_thread_function ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b6034700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 17 (Thread 0x7f74b302e700 (LWP 43334)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001b4f1bc in std::condition_variable::wait(std::unique_lock<std::mutex>&) ()
#2  0x0000000001097cc1 in mongo::WiredTigerRecordStore::OplogStones::awaitHasExcessStonesOrDead() ()
#3  0x0000000001098c00 in mongo::WiredTigerRecordStore::yieldAndAwaitOplogDeletionRequest(mongo::OperationContext*) ()
#4  0x000000000109eb83 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::_deleteExcessDocuments() ()
#5  0x000000000109f0f8 in mongo::(anonymous namespace)::WiredTigerRecordStoreThread::run() ()
#6  0x00000000012b2e90 in mongo::BackgroundJob::jobBody() ()
#7  0x0000000001b50620 in execute_native_thread_routine ()
#8  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b302e700) at pthread_create.c:309
#9  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 16 (Thread 0x7f74a0f09700 (LWP 43453)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001344cf0 in asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::error_code const&) ()
---Type <return> to continue, or q <return> to quit---
#2  0x0000000001344fa1 in asio::detail::scheduler::run(std::error_code&) ()
#3  0x000000000134913f in asio::io_service::run() ()
#4  0x000000000133c8e0 in asio_detail_posix_thread_function ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74a0f09700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 15 (Thread 0x7f74899cf700 (LWP 65507)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x0000000001344cf0 in asio::detail::scheduler::do_run_one(asio::detail::scoped_lock<asio::detail::posix_mutex>&, asio::detail::scheduler_thread_info&, std::error_code const&) ()
#2  0x0000000001344fa1 in asio::detail::scheduler::run(std::error_code&) ()
#3  0x000000000134913f in asio::io_service::run() ()
#4  0x000000000133c8e0 in asio_detail_posix_thread_function ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74899cf700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 14 (Thread 0x7f74b382f700 (LWP 239626)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a3928e in __evict_lru_pages ()
#3  0x0000000001a3a422 in __wt_evict_thread_run ()
#4  0x0000000001a9dd66 in __wt_thread_run ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74b382f700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 13 (Thread 0x7f74891ce700 (LWP 239627)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a3b400 in __wt_evict_thread_run ()
#3  0x0000000001a9dd66 in __wt_thread_run ()
#4  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74891ce700) at pthread_create.c:309
#5  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 12 (Thread 0x7f7484185700 (LWP 239628)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
#1  0x0000000001a60bfe in __wt_cond_wait_signal ()
#2  0x0000000001a3928e in __evict_lru_pages ()
#3  0x0000000001a3a422 in __wt_evict_thread_run ()
#4  0x0000000001a9dd66 in __wt_thread_run ()
#5  0x00007f74c0c7f0a4 in start_thread (arg=0x7f7484185700) at pthread_create.c:309
#6  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 11 (Thread 0x7f747a3fd700 (LWP 34918)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=95, buf=0x7f747a3fc480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f747a3fd700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 10 (Thread 0x7f7478fe9700 (LWP 141527)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=454, buf=0x7f7478fe8480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
---Type <return> to continue, or q <return> to quit---
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f7478fe9700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 9 (Thread 0x7f74887cb700 (LWP 213733)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=52, buf=0x7f74887ca480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f74887cb700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 8 (Thread 0x7f748216c700 (LWP 114349)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=56, buf=0x7f748216b480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f748216c700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 7 (Thread 0x7f747f33e700 (LWP 48361)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=180, buf=0x7f747f33d480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f747f33e700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 6 (Thread 0x7f746fa54700 (LWP 48362)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=308, buf=0x7f746fa53480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f746fa54700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 5 (Thread 0x7f7471e78700 (LWP 48363)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=505, buf=0x7f7471e77480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f7471e78700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 4 (Thread 0x7f7471c76700 (LWP 48369)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=511, buf=0x7f7471c75480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
---Type <return> to continue, or q <return> to quit---
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f7471c76700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 3 (Thread 0x7f7471872700 (LWP 48370)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=512, buf=0x7f7471871480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f7471872700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 2 (Thread 0x7f747c00b700 (LWP 48606)):
#0  0x00007f74c0c85ddb in __libc_recv (fd=392, buf=0x7f747c00a480, n=16, flags=-1) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x00000000012ded5f in mongo::Socket::_recv(char*, int) ()
#2  0x00000000012ded91 in mongo::Socket::unsafe_recv(char*, int) ()
#3  0x00000000012deded in mongo::Socket::recv(char*, int) ()
#4  0x00000000012d2d11 in mongo::MessagingPort::recv(mongo::Message&) ()
#5  0x00000000012d4df7 in mongo::PortMessageServer::handleIncomingMsg(void*) ()
#6  0x00007f74c0c7f0a4 in start_thread (arg=0x7f747c00b700) at pthread_create.c:309
#7  0x00007f74c09b487d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
 
Thread 1 (Thread 0x7f74c2024cc0 (LWP 42742)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f74c0c8829a in _L_lock_55 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f74c0c88211 in __lll_lock_elision (futex=0x20aad40 <mongo::shutdownLock>, adapt_count=<optimized out>, private=0)
    at ../nptl/sysdeps/unix/sysv/linux/x86/elision-lock.c:94
#3  0x0000000000cdced2 in mongo::exitCleanly(mongo::ExitCode) ()
#4  0x0000000000970274 in main ()



 Comments   
Comment by Geert Bosch [ 23/Jan/17 ]

Hi Xihui, my apologies for not recognizing that you also reported SERVER-26018. Based on the symptoms, it seems very likely that this is another issue related to lock elision features enabled in your environment. It is hard for us to know for sure if the actual issue here is in mongod itself, system libraries such as glibc that we depend on or Intel hardware or firmware. I have checked all uses of the shutdownMutex to ensure there cannot be calls that call unlock on a mutex that isn't locked. Additionally, shutdownMutex has static lifetime, so there is no possibility of use-after-free.

Debian 8.7, which was recently released, has disabled lock elision again, so one way to work around your issue would likely be to upgrade to that version of Debian.

Regards,
-Geert

Comment by xihui he [ 23/Jan/17 ]

Hi geert.bosch,

SERVER-26018 is reported by us. We have upgraded our intel-code.

processor	: 39
vendor_id	: GenuineIntel
cpu family	: 6
model		: 79
model name	: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
stepping	: 1
microcode	: 0xb00001d
cpu MHz		: 2025.289
cache size	: 25600 KB
physical id	: 1
siblings	: 20
core id		: 12
cpu cores	: 10
apicid		: 57
initial apicid	: 57
fpu		: yes
fpu_exception	: yes
cpuid level	: 20
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap
bogomips	: 4401.03
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

Thanks,
Xihui

Comment by Geert Bosch [ 17/Jan/17 ]

Given this thread:

Thread 1 (Thread 0x7f74c2024cc0 (LWP 42742)):
#0  __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x00007f74c0c8829a in _L_lock_55 () from /lib/x86_64-linux-gnu/libpthread.so.0
#2  0x00007f74c0c88211 in __lll_lock_elision (futex=0x20aad40 <mongo::shutdownLock>, adapt_count=<optimized out>, private=0)
    at ../nptl/sysdeps/unix/sysv/linux/x86/elision-lock.c:94
#3  0x0000000000cdced2 in mongo::exitCleanly(mongo::ExitCode) ()
#4  0x0000000000970274 in main ()

and the lack of obvious deadlock cycles, it seems like this could potentially be a TSX hardware related issue, see SERVER-26018, though that's far from certain. We'd need /proc/cpuinfo output to confirm or rule this out.

Comment by Mark Benvenuto [ 28/Dec/16 ]

I stared at this for almost an hour, and I cannot seem to find the deadlock.

  1. The signal handler called exitCleanly() and waited on a global X lock. It holds the exitCleanly mutex.
  2. initAndListen exited and called exitCleanly() which waited on the exitCleanly mutex.
  3. TTL is trying to get the global lock first as part of getting a db lock, but was not involved the deadlock,

The log file confirmed that this was the result of SIGTERM, I do not know which thread owns the GlobalLock.

Comment by xihui he [ 18/Dec/16 ]

Hi anonymous.user,

Below is the log:

2016-12-15T19:14:58.827+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.137:54035 #1294348 (144 connections now open)
2016-12-15T19:14:59.469+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.137:54036 #1294349 (145 connections now open)
2016-12-15T19:15:02.829+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.137:54047 #1294350 (146 connections now open)
2016-12-15T19:15:05.471+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.137:54055 #1294351 (147 connections now open)
2016-12-15T19:15:05.612+0800 I COMMAND  [conn1294216] command media.mediameta command: getMore { getMore: 50842537558, collection: "mediameta" } planSummary: IXSCAN { _id: 1 } cursorid:50842537558 keysExamined:1055 docsExamined:529 cursorExhausted:1 keyUpdates:0 writeConflicts:0 numYields:10 nreturned:529 reslen:244981 locks:{ Global: { acquireCount: { r: 22 }, acquireWaitCount: { r: 9 }, timeAcquiringMicros: { r: 5596 } }, Database: { acquireCount: { r: 11 } }, Collection: { acquireCount: { r: 11 } } } protocol:op_command 168ms
2016-12-15T19:15:06.518+0800 I NETWORK  [initandlisten] connection accepted from 10.9.16.136:48379 #1294352 (148 connections now open)
2016-12-15T19:15:08.523+0800 I CONTROL  [signalProcessingThread] got signal 15 (Terminated), will terminate after current cmd ends
2016-12-15T19:15:08.539+0800 I FTDC     [signalProcessingThread] Shutting down full-time diagnostic data capture
2016-12-15T19:15:08.547+0800 I STORAGE  [conn1294120] got request after shutdown()
2016-12-15T19:15:08.547+0800 I STORAGE  [conn1293428] got request after shutdown()
2016-12-15T19:15:08.547+0800 I STORAGE  [conn1294320] got request after shutdown()
2016-12-15T19:15:08.547+0800 I STORAGE  [conn1294117] got request after shutdown()
2016-12-15T19:15:08.549+0800 I STORAGE  [conn1294323] got request after shutdown()
2016-12-15T19:15:08.624+0800 I REPL     [signalProcessingThread] Stopping replication applier threads
2016-12-15T19:15:09.071+0800 I STORAGE  [conn265047] got request after shutdown()
2016-12-15T19:15:09.099+0800 I STORAGE  [conn264989] got request after shutdown()
2016-12-15T19:15:09.203+0800 I STORAGE  [conn269025] got request after shutdown()
2016-12-15T19:15:09.227+0800 I STORAGE  [conn271088] got request after shutdown()
2016-12-15T19:15:09.454+0800 I STORAGE  [conn394025] got request after shutdown()
2016-12-15T19:15:10.145+0800 I STORAGE  [conn264988] got request after shutdown()
2016-12-15T19:15:10.432+0800 I STORAGE  [conn264977] got request after shutdown()
2016-12-15T19:15:10.494+0800 I STORAGE  [conn268921] got request after shutdown()
2016-12-15T19:15:10.562+0800 W SHARDING [signalProcessingThread] error encountered while cleaning up distributed ping entry for n9-017-096:27317:1477474960:324076582 :: caused by :: ShutdownInProgress: Shutdown in progress
2016-12-15T19:15:12.586+0800 I STORAGE  [conn265150] got request after shutdown()
2016-12-15T19:15:13.046+0800 I STORAGE  [conn265554] got request after shutdown()
2016-12-15T19:15:13.282+0800 I STORAGE  [conn264987] got request after shutdown()
2016-12-15T19:15:13.293+0800 I STORAGE  [conn1275290] got request after shutdown()
2016-12-15T19:15:13.383+0800 I STORAGE  [conn272010] got request after shutdown()
2016-12-15T19:15:14.887+0800 I STORAGE  [conn272468] got request after shutdown()
2016-12-15T19:15:15.091+0800 I STORAGE  [conn384144] got request after shutdown()
2016-12-15T19:15:16.056+0800 I STORAGE  [conn381654] got request after shutdown()
2016-12-15T19:15:16.080+0800 I STORAGE  [conn264986] got request after shutdown()
2016-12-15T19:15:16.886+0800 I NETWORK  [conn1294336] end connection 10.9.16.137:53893 (125 connections now open)
2016-12-15T19:15:17.007+0800 I STORAGE  [conn265503] got request after shutdown()
2016-12-15T19:15:17.141+0800 I STORAGE  [conn394092] got request after shutdown()
2016-12-15T19:15:17.147+0800 I STORAGE  [conn264975] got request after shutdown()
2016-12-15T19:15:18.083+0800 I STORAGE  [conn394388] got request after shutdown()
2016-12-15T19:15:18.493+0800 I STORAGE  [conn1281598] got request after shutdown()
2016-12-15T19:15:18.501+0800 I STORAGE  [conn1281599] got request after shutdown()
2016-12-15T19:15:18.752+0800 I STORAGE  [conn920548] got request after shutdown()
2016-12-15T19:15:18.753+0800 I STORAGE  [conn394195] got request after shutdown()
2016-12-15T19:15:31.167+0800 I NETWORK  [conn1294262] end connection 10.9.16.137:53110 (116 connections now open)
2016-12-15T19:15:31.384+0800 I NETWORK  [conn1294181] end connection 10.9.16.137:52519 (115 connections now open)
2016-12-15T19:15:39.901+0800 I NETWORK  [conn1294304] end connection 10.9.16.137:53464 (114 connections now open)
2016-12-15T19:15:40.768+0800 I NETWORK  [conn1294342] end connection 10.9.16.137:53985 (113 connections now open)
2016-12-15T19:15:51.663+0800 I NETWORK  [conn1294331] end connection 10.9.16.137:53755 (112 connections now open)
2016-12-15T19:15:53.514+0800 I NETWORK  [conn1294250] end connection 10.9.16.137:53026 (111 connections now open)
2016-12-15T19:15:56.290+0800 I NETWORK  [conn1293951] end connection 10.9.16.136:45906 (110 connections now open)
2016-12-15T19:15:56.291+0800 I NETWORK  [conn1293823] end connection 10.9.16.136:45098 (110 connections now open)
2016-12-15T19:15:56.807+0800 I NETWORK  [conn1294259] end connection 10.9.16.137:53107 (108 connections now open)
2016-12-15T19:15:58.828+0800 I NETWORK  [conn1294348] end connection 10.9.16.137:54035 (107 connections now open)
2016-12-15T19:15:59.470+0800 I NETWORK  [conn1294349] end connection 10.9.16.137:54036 (106 connections now open)
2016-12-15T19:16:01.588+0800 I STORAGE  [conn1293957] got request after shutdown()
2016-12-15T19:16:02.832+0800 I NETWORK  [conn1294350] end connection 10.9.16.137:54047 (104 connections now open)
2016-12-15T19:16:03.655+0800 I NETWORK  [conn1294228] end connection 10.9.16.137:52829 (103 connections now open)
2016-12-15T19:16:05.052+0800 I STORAGE  [conn1294268] got request after shutdown()
2016-12-15T19:16:05.200+0800 I STORAGE  [conn1294346] got request after shutdown()
2016-12-15T19:16:05.414+0800 I STORAGE  [conn1293934] got request after shutdown()
2016-12-15T19:16:05.578+0800 I NETWORK  [conn1294351] end connection 10.9.16.137:54055 (99 connections now open)
2016-12-15T19:16:05.618+0800 I STORAGE  [conn1293990] got request after shutdown()
2016-12-15T19:16:05.958+0800 I STORAGE  [conn1294312] got request after shutdown()
2016-12-15T19:16:06.084+0800 I STORAGE  [conn1294156] got request after shutdown()
2016-12-15T19:16:06.113+0800 I STORAGE  [conn1293222] got request after shutdown()
2016-12-15T19:16:06.114+0800 I STORAGE  [conn1294070] got request after shutdown()
2016-12-15T19:16:06.314+0800 I STORAGE  [conn1294060] got request after shutdown()
2016-12-15T19:16:06.526+0800 I STORAGE  [conn1294302] got request after shutdown()
2016-12-15T19:16:06.531+0800 I NETWORK  [conn1294352] end connection 10.9.16.136:48379 (91 connections now open)
2016-12-15T19:16:06.544+0800 I STORAGE  [conn1293936] got request after shutdown()
2016-12-15T19:16:06.783+0800 I STORAGE  [conn1294104] got request after shutdown()
2016-12-15T19:16:06.836+0800 I STORAGE  [conn1293618] got request after shutdown()
2016-12-15T19:16:07.419+0800 I STORAGE  [conn1294280] got request after shutdown()
2016-12-15T19:16:07.449+0800 I STORAGE  [conn1293915] got request after shutdown()
2016-12-15T19:16:07.472+0800 I STORAGE  [conn1294345] got request after shutdown()
2016-12-15T19:16:07.514+0800 I STORAGE  [conn1293943] got request after shutdown()
2016-12-15T19:16:07.518+0800 I STORAGE  [conn1294339] got request after shutdown()
2016-12-15T19:16:07.540+0800 I STORAGE  [conn1294200] got request after shutdown()
2016-12-15T19:16:07.562+0800 I STORAGE  [conn1294143] got request after shutdown()
2016-12-15T19:16:07.606+0800 I STORAGE  [conn1294274] got request after shutdown()
2016-12-15T19:16:07.740+0800 I STORAGE  [conn1294010] got request after shutdown()
2016-12-15T19:16:07.744+0800 I STORAGE  [conn1294288] got request after shutdown()
2016-12-15T19:16:07.807+0800 I STORAGE  [conn1294240] got request after shutdown()
2016-12-15T19:16:07.844+0800 I STORAGE  [conn1294124] got request after shutdown()
2016-12-15T19:16:07.882+0800 I STORAGE  [conn1294285] got request after shutdown()
2016-12-15T19:16:07.890+0800 I STORAGE  [conn1294287] got request after shutdown()
2016-12-15T19:16:07.959+0800 I STORAGE  [conn1294283] got request after shutdown()
2016-12-15T19:16:08.011+0800 I STORAGE  [conn1293586] got request after shutdown()
2016-12-15T19:16:08.045+0800 I STORAGE  [conn1294232] got request after shutdown()
2016-12-15T19:16:08.059+0800 I STORAGE  [conn1294335] got request after shutdown()
2016-12-15T19:16:08.077+0800 I STORAGE  [conn1294341] got request after shutdown()
2016-12-15T19:16:08.089+0800 I STORAGE  [conn1293997] got request after shutdown()
2016-12-15T19:16:08.118+0800 I STORAGE  [conn1294314] got request after shutdown()
2016-12-15T19:16:08.122+0800 I STORAGE  [conn1294075] got request after shutdown()
2016-12-15T19:16:08.127+0800 I STORAGE  [conn1294306] got request after shutdown()
2016-12-15T19:16:08.133+0800 I STORAGE  [conn1294322] got request after shutdown()
2016-12-15T19:16:08.133+0800 I STORAGE  [conn1290930] got request after shutdown()
2016-12-15T19:16:08.139+0800 I STORAGE  [conn1294319] got request after shutdown()
2016-12-15T19:16:08.161+0800 I STORAGE  [conn1293721] got request after shutdown()
2016-12-15T19:16:08.200+0800 I STORAGE  [conn1294213] got request after shutdown()
2016-12-15T19:16:08.221+0800 I STORAGE  [conn1294344] got request after shutdown()
2016-12-15T19:16:08.226+0800 I STORAGE  [conn1293864] got request after shutdown()
2016-12-15T19:16:08.227+0800 I STORAGE  [conn1294233] got request after shutdown()
2016-12-15T19:16:08.256+0800 I STORAGE  [conn1294157] got request after shutdown()
2016-12-15T19:16:08.263+0800 I STORAGE  [conn1293776] got request after shutdown()
2016-12-15T19:16:08.286+0800 I STORAGE  [conn1293257] got request after shutdown()
2016-12-15T19:16:08.293+0800 I STORAGE  [conn1294234] got request after shutdown()
2016-12-15T19:16:08.298+0800 I STORAGE  [conn1293869] got request after shutdown()
2016-12-15T19:16:08.303+0800 I STORAGE  [conn1293400] got request after shutdown()
2016-12-15T19:16:08.306+0800 I STORAGE  [conn1293780] got request after shutdown()
2016-12-15T19:16:08.315+0800 I STORAGE  [conn1294170] got request after shutdown()
2016-12-15T19:16:08.319+0800 I STORAGE  [conn1294338] got request after shutdown()
2016-12-15T19:16:08.319+0800 I STORAGE  [conn1293930] got request after shutdown()
2016-12-15T19:16:08.319+0800 I STORAGE  [conn1294242] got request after shutdown()
2016-12-15T19:16:08.331+0800 I STORAGE  [conn1293877] got request after shutdown()
2016-12-15T19:16:08.339+0800 I STORAGE  [conn1294265] got request after shutdown()
2016-12-15T19:16:08.340+0800 I STORAGE  [conn1293429] got request after shutdown()
2016-12-15T19:16:08.341+0800 I STORAGE  [conn1294313] got request after shutdown()
2016-12-15T19:16:08.343+0800 I STORAGE  [conn1293948] got request after shutdown()
2016-12-15T19:16:08.352+0800 I STORAGE  [conn1294328] got request after shutdown()
2016-12-15T19:16:08.352+0800 I STORAGE  [conn1294347] got request after shutdown()
2016-12-15T19:16:08.371+0800 I NETWORK  [conn1294340] end connection 10.9.16.137:53979 (38 connections now open)
2016-12-15T19:16:08.374+0800 I STORAGE  [conn1294330] got request after shutdown()
2016-12-15T19:16:08.377+0800 I STORAGE  [conn1294315] got request after shutdown()
2016-12-15T19:16:08.386+0800 I STORAGE  [conn1294282] got request after shutdown()
2016-12-15T19:16:08.401+0800 I STORAGE  [conn1294218] got request after shutdown()
2016-12-15T19:16:08.412+0800 I STORAGE  [conn1294215] got request after shutdown()
2016-12-15T19:16:08.422+0800 I STORAGE  [conn1293405] got request after shutdown()
2016-12-15T19:16:08.426+0800 I STORAGE  [conn1294090] got request after shutdown()
2016-12-15T19:16:08.432+0800 I STORAGE  [conn1294321] got request after shutdown()
2016-12-15T19:16:08.433+0800 I STORAGE  [conn1293799] got request after shutdown()
2016-12-15T19:16:08.460+0800 I STORAGE  [conn1294023] got request after shutdown()
2016-12-15T19:16:08.460+0800 I STORAGE  [conn1293462] got request after shutdown()
2016-12-15T19:16:08.461+0800 I STORAGE  [conn1293847] got request after shutdown()
2016-12-15T19:16:08.465+0800 I STORAGE  [conn1294152] got request after shutdown()
2016-12-15T19:16:08.467+0800 I STORAGE  [conn1294019] got request after shutdown()
2016-12-15T19:16:08.480+0800 I STORAGE  [conn1293555] got request after shutdown()
2016-12-15T19:16:08.485+0800 I STORAGE  [conn1294216] got request after shutdown()
2016-12-15T19:16:08.485+0800 I STORAGE  [conn1294199] got request after shutdown()
2016-12-15T19:16:08.485+0800 I STORAGE  [conn1292406] got request after shutdown()
2016-12-15T19:16:08.490+0800 I STORAGE  [conn1294273] got request after shutdown()
2016-12-15T19:16:08.500+0800 I STORAGE  [conn1294239] got request after shutdown()
2016-12-15T19:16:08.500+0800 I STORAGE  [conn1292901] got request after shutdown()
2016-12-15T19:16:08.501+0800 I STORAGE  [conn1294308] got request after shutdown()
2016-12-15T19:16:08.505+0800 I STORAGE  [conn1294171] got request after shutdown()
2016-12-15T19:16:08.509+0800 I STORAGE  [conn1294263] got request after shutdown()
2016-12-15T19:16:08.510+0800 I STORAGE  [conn1294307] got request after shutdown()
2016-12-15T19:16:08.511+0800 I STORAGE  [conn1294279] got request after shutdown()
2016-12-15T19:16:08.519+0800 I STORAGE  [conn1294311] got request after shutdown()
2016-12-15T19:16:08.522+0800 I STORAGE  [conn1293074] got request after shutdown()
2016-12-15T19:50:04.611+0800 I CONTROL  [main] ***** SERVER RESTARTED *****
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] MongoDB starting : pid=157873 port=27317 dbpath=/opt/tiger/mongodb/data 64-bit host=n9-017-096
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] db version v3.2.10
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] git version: 79d9b3ab5ce20f51c272b4411202710a082d0317
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] allocator: tcmalloc
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] modules: none
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] build environment:
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten]     distmod: debian81
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten]     distarch: x86_64
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten]     target_arch: x86_64
2016-12-15T19:50:04.660+0800 I CONTROL  [initandlisten] options: { config: "conf/mongodb.yaml", net: { bindIp: "10.9.17.96,127.0.0.1", maxIncomingConnections: 5000, port: 27317 }, processManagement: { fork: true }, replication: { replSetName: "shard5" }, storage: { dbPath: "/opt/tiger/mongodb/data", directoryPerDB: true, engine: "wiredTiger", journal: { enabled: true }, wiredTiger: { engineConfig: { cacheSizeGB: 70 } } }, systemLog: { destination: "file", logAppend: true, logRotate: "reopen", path: "/opt/tiger/mongodb/logs/mongodb.log" } }
2016-12-15T19:50:04.662+0800 W -        [initandlisten] Detected unclean shutdown - /opt/tiger/mongodb/data/mongod.lock is not empty.
2016-12-15T19:50:04.662+0800 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2016-12-15T19:50:04.662+0800 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=70G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2016-12-15T19:50:05.153+0800 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2016-12-15T19:50:05.154+0800 I STORAGE  [initandlisten] The size storer reports that the oplog contains 157244973 records totaling to 42922684729 bytes
2016-12-15T19:50:05.154+0800 I STORAGE  [initandlisten] Sampling from the oplog between Dec  8 22:00:26:107 and Dec 15 19:15:08:12 to determine where to place markers for truncation
2016-12-15T19:50:05.154+0800 I STORAGE  [initandlisten] Taking 1000 samples and assuming that each section of oplog contains approximately 1572222 records totaling to 429164684 bytes
2016-12-15T19:50:07.629+0800 I STORAGE  [initandlisten] Placing a marker at optime Dec  8 22:16:11:4a1
2016-12-15T19:50:07.629+0800 I STORAGE  [initandlisten] Placing a marker at optime Dec  8 22:32:44:2d1
2016-12-15T19:50:07.629+0800 I STORAGE  [initandlisten] Placing a marker at optime Dec  8 22:52:51:2c3

Comment by Kelsey Schubert [ 16/Dec/16 ]

Hi bydsky,

We're continuing to investigate this issue. To help us, would you please provide the complete logs of the affected mongod?

Thank you for your help,
Thomas

Comment by Kelsey Schubert [ 15/Dec/16 ]

Hi bydsky,

Thanks for the report. We're investigating this issue and will update the ticket when we know more.

Kind regards,
Thomas

Generated at Thu Feb 08 04:15:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.