Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-8251

Startup hangs infinitely, DataFileSync background job cannot create new thread

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Minor - P4 Minor - P4
    • None
    • 2.2.0
    • Concurrency
    • None
    • CentOS release 5.8 (Final), 2.6.18-308.16.1.el5 #1 SMP Tue Oct 2 22:01:43 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux
    • ALL

    Description

      Installed the 2.2.0 rpm package from 10gen repo. 'service mongod start' creates 3 processes:

      root 12603 9583 0 12:59 pts/0 00:00:00 /bin/sh /sbin/service mongod restart
      root 12608 12603 0 12:59 pts/0 00:00:00 /bin/bash /etc/init.d/mongod restart
      root 12622 12608 0 12:59 pts/0 00:00:00 runuser -s /bin/bash - mongod -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.
      mongod 12623 12622 0 12:59 ? 00:00:00 -bash -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.conf
      mongod 12645 12623 1 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
      mongod 12647 12645 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
      mongod 12648 12647 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf

      strace of PID 12648, the third - obviously hanging - process gives:
      ...
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 67669616}

      ) = -1 ETIMEDOUT (Connection timed out)
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 60868464}

      ) = -1 ETIMEDOUT (Connection timed out)
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 167533952}

      ) = -1 ETIMEDOUT (Connection timed out)
      ...

      gdb:

      Thread 2 (Thread 0x40a87940 (LWP 12649)):
      #0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1 0x000000000086210c in SpinLock::SlowLock() ()
      #2 0x0000000000866056 in tcmalloc::ThreadCache::CreateCacheIfNecessary() ()
      #3 0x00000000009b0857 in ?? ()
      #4 0x0000000000c22872 in tc_malloc ()
      #5 0x00000000009e63aa in boost::detail::get_once_per_thread_epoch() ()
      #6 0x00000000007c4ff8 in void boost::call_once<void ()>(boost::once_flag&, void ()) ()
      #7 0x00000000007c1e57 in boost::detail::set_current_thread_data(boost::detail::thread_data_base*) ()
      #8 0x00000000007c3646 in ?? ()
      #9 0x0000003124e0677d in start_thread () from /lib64/libpthread.so.0
      #10 0x00000031246d3c1d in clone () from /lib64/libc.so.6

      Thread 1 (Thread 0x2b5613d478c0 (LWP 12648)):
      #0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1 0x000000000086210c in SpinLock::SlowLock() ()
      #2 0x00000000008b92db in tcmalloc::CentralFreeList::Populate() ()
      #3 0x00000000008b9498 in tcmalloc::CentralFreeList::FetchFromSpansSafe() ()
      #4 0x00000000008b9534 in tcmalloc::CentralFreeList::RemoveRange(void*, void*, int) ()
      #5 0x0000000000865c0d in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
      #6 0x00000000009b0f2f in ?? ()
      #7 0x0000000000c21c95 in tc_new ()
      #8 0x0000000000599e14 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >* boost::detail::heap_new_impl<boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&>(boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&) ()
      #9 0x00000000005954fa in mongo::BackgroundJob::go() ()
      #10 0x00000000005630cc in ?? ()
      #11 0x0000000000565399 in main ()

      This behaviour is somewhat random, because sometimes the startup works.

      Notes: I rebuilt mongod from source r2.2.0, stripped the binary manually and to my surprise this binary, does not show this behaviour. Alas, another binary installed with 'scons install' always hangs.

      Attachments

        Activity

          People

            schwerin@mongodb.com Andy Schwerin
            wangyuontheway WangYu
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: