Details
-
Bug
-
Resolution: Duplicate
-
Minor - P4
-
None
-
2.2.0
-
None
-
CentOS release 5.8 (Final), 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:43 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
-
ALL
Description
Installed the 2.2.0 rpm package from 10gen repo. 'service mongod start' creates 3 processes:
root 12603 9583 0 12:59 pts/0 00:00:00 /bin/sh /sbin/service mongod restart
root 12608 12603 0 12:59 pts/0 00:00:00 /bin/bash /etc/init.d/mongod restart
root 12622 12608 0 12:59 pts/0 00:00:00 runuser -s /bin/bash - mongod -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.
mongod 12623 12622 0 12:59 ? 00:00:00 -bash -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.conf
mongod 12645 12623 1 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
mongod 12647 12645 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
mongod 12648 12647 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
strace of PID 12648, the third - obviously hanging - process gives:
...
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,
) = -1 ETIMEDOUT (Connection timed out)
...
gdb:
Thread 2 (Thread 0x40a87940 (LWP 12649)):
#0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
#1 0x000000000086210c in SpinLock::SlowLock() ()
#2 0x0000000000866056 in tcmalloc::ThreadCache::CreateCacheIfNecessary() ()
#3 0x00000000009b0857 in ?? ()
#4 0x0000000000c22872 in tc_malloc ()
#5 0x00000000009e63aa in boost::detail::get_once_per_thread_epoch() ()
#6 0x00000000007c4ff8 in void boost::call_once<void
()>(boost::once_flag&, void
()) ()
#7 0x00000000007c1e57 in boost::detail::set_current_thread_data(boost::detail::thread_data_base*) ()
#8 0x00000000007c3646 in ?? ()
#9 0x0000003124e0677d in start_thread () from /lib64/libpthread.so.0
#10 0x00000031246d3c1d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2b5613d478c0 (LWP 12648)):
#0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
#1 0x000000000086210c in SpinLock::SlowLock() ()
#2 0x00000000008b92db in tcmalloc::CentralFreeList::Populate() ()
#3 0x00000000008b9498 in tcmalloc::CentralFreeList::FetchFromSpansSafe() ()
#4 0x00000000008b9534 in tcmalloc::CentralFreeList::RemoveRange(void*, void*, int) ()
#5 0x0000000000865c0d in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
#6 0x00000000009b0f2f in ?? ()
#7 0x0000000000c21c95 in tc_new ()
#8 0x0000000000599e14 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >* boost::detail::heap_new_impl<boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&>(boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&) ()
#9 0x00000000005954fa in mongo::BackgroundJob::go() ()
#10 0x00000000005630cc in ?? ()
#11 0x0000000000565399 in main ()
This behaviour is somewhat random, because sometimes the startup works.
Notes: I rebuilt mongod from source r2.2.0, stripped the binary manually and to my surprise this binary, does not show this behaviour. Alas, another binary installed with 'scons install' always hangs.
Attachments
Issue Links
- duplicates
-
SERVER-7434 Startup race with --fork
-
- Closed
-
- is duplicated by
-
SERVER-8252 Startup hangs infinitely, DataFileSync background job cannot create new thread
-
- Closed
-