Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-7434

Startup race with --fork

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Minor - P4 Minor - P4
    • 2.2.3, 2.4.0-rc1
    • Affects Version/s: 2.2.0, 2.2.2, 2.3.2
    • Component/s: Concurrency
    • Labels:
      None
    • Environment:
      All supported OSes except Windows.
    • Fully Compatible
    • ALL

      Installed the 2.2.0 rpm package from 10gen repo. 'service mongod start' creates 3 processes:

      root 12603 9583 0 12:59 pts/0 00:00:00 /bin/sh /sbin/service mongod restart
      root 12608 12603 0 12:59 pts/0 00:00:00 /bin/bash /etc/init.d/mongod restart
      root 12622 12608 0 12:59 pts/0 00:00:00 runuser -s /bin/bash - mongod -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.
      mongod 12623 12622 0 12:59 ? 00:00:00 -bash -c ulimit -S -c 0 >/dev/null 2>&1 ; numactl --interleave=all /usr/bin/mongod -f /etc/mongod.conf
      mongod 12645 12623 1 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
      mongod 12647 12645 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf
      mongod 12648 12647 0 12:59 ? 00:00:00 /usr/bin/mongod -f /etc/mongod.conf

      strace of PID 12648, the third - obviously hanging - process gives:
      ...
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 67669616}

      ) = -1 ETIMEDOUT (Connection timed out)
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 60868464}

      ) = -1 ETIMEDOUT (Connection timed out)
      futex(0x13dc400, FUTEX_WAIT_PRIVATE, 2,

      {0, 167533952}

      ) = -1 ETIMEDOUT (Connection timed out)
      ...

      gdb:

      Thread 2 (Thread 0x40a87940 (LWP 12649)):
      #0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1 0x000000000086210c in SpinLock::SlowLock() ()
      #2 0x0000000000866056 in tcmalloc::ThreadCache::CreateCacheIfNecessary() ()
      #3 0x00000000009b0857 in ?? ()
      #4 0x0000000000c22872 in tc_malloc ()
      #5 0x00000000009e63aa in boost::detail::get_once_per_thread_epoch() ()
      #6 0x00000000007c4ff8 in void boost::call_once<void ()>(boost::once_flag&, void ()) ()
      #7 0x00000000007c1e57 in boost::detail::set_current_thread_data(boost::detail::thread_data_base*) ()
      #8 0x00000000007c3646 in ?? ()
      #9 0x0000003124e0677d in start_thread () from /lib64/libpthread.so.0
      #10 0x00000031246d3c1d in clone () from /lib64/libc.so.6

      Thread 1 (Thread 0x2b5613d478c0 (LWP 12648)):
      #0 0x00000000008d2996 in base::internal::SpinLockDelay(int volatile*, int, int) ()
      #1 0x000000000086210c in SpinLock::SlowLock() ()
      #2 0x00000000008b92db in tcmalloc::CentralFreeList::Populate() ()
      #3 0x00000000008b9498 in tcmalloc::CentralFreeList::FetchFromSpansSafe() ()
      #4 0x00000000008b9534 in tcmalloc::CentralFreeList::RemoveRange(void*, void*, int) ()
      #5 0x0000000000865c0d in tcmalloc::ThreadCache::FetchFromCentralCache(unsigned long, unsigned long) ()
      #6 0x00000000009b0f2f in ?? ()
      #7 0x0000000000c21c95 in tc_new ()
      #8 0x0000000000599e14 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >* boost::detail::heap_new_impl<boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > > >, boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&>(boost::_bi::bind_t<void, boost::_mfi::mf1<void, mongo::BackgroundJob, boost::shared_ptr<mongo::BackgroundJob::JobStatus> >, boost::_bi::list2<boost::_bi::value<mongo::BackgroundJob*>, boost::_bi::value<boost::shared_ptr<mongo::BackgroundJob::JobStatus> > > >&) ()
      #9 0x00000000005954fa in mongo::BackgroundJob::go() ()
      #10 0x00000000005630cc in ?? ()
      #11 0x0000000000565399 in main ()

      This behaviour is somewhat random, because sometimes the startup works.

      Notes: I rebuilt mongod from source r2.2.0, stripped the binary manually and to my surprise this binary, does not show this behaviour. Alas, another binary installed with 'scons install' always hangs.

            Assignee:
            schwerin@mongodb.com Andy Schwerin
            Reporter:
            mdm Martin Buechler
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: