Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-37313

FTDC collection blocked during foreground index build on secondary

    • Fully Compatible
    • ALL
    • v4.0, v3.6
    • Storage NYC 2018-10-22, Storage NYC 2018-11-05
    • 56

      Repro: create a collection, do a foreground index build. (Issue does not affect background index builds.) When it finishes on the primary and is replicated to secondaries, FTDC data collection will be blocked for the duration of the index build on the secondary. This is a regression in 3.6. and 4.0.

      This is problematic because index builds can stress system resources and so FTDC may be particularly important for diagnosing problems during index builds. We implemented SERVER-26005 to avoid this in 3.4. I'm not sure though why this has regressed in 3.6 and 4.0 - as far as I can see SERVER-26005 is still in place in 3.6 and should be irrelevant in 4.0.

      Stack traces show we are stalled acquiring a global lock in WiredTigerServerStatusSection::generateSection. Looks like this is a MODE_IS global lock, so I'm not sure what it's conflicting with.

      3.6.8:

      #1  0x00005598c6a76528 in mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >) ()
      #2  0x00005598c6a7b0ee in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, mongo::Duration<std::ratio<1l, 1000l> >, bool) ()
      #3  0x00005598c6a695cf in mongo::Lock::GlobalLock::waitForLock(unsigned int) ()
      #4  0x00005598c6a69c55 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, unsigned int) ()
      #5  0x00005598c5936fcc in mongo::WiredTigerServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const ()
      #6  0x00005598c59070ea in mongo::ServerStatusSection::appendSection(mongo::OperationContext*, mongo::BSONElement const&, mongo::BSONObjBuilder*) const ()
      #7  0x00005598c6a3760f in mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
      #8  0x00005598c6a86af6 in mongo::BasicCommand::enhancedRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
      #9  0x00005598c6a81d4f in mongo::Command::publicRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
      #10 0x00005598c6a84bf6 in mongo::Command::runCommandDirectly(mongo::OperationContext*, mongo::OpMsgRequest const&) ()
      #11 0x00005598c6850fdd in mongo::FTDCSimpleInternalCommandCollector::collect(mongo::OperationContext*, mongo::BSONObjBuilder&) ()
      #12 0x00005598c685f713 in mongo::FTDCCollectorCollection::collect(mongo::Client*) ()
      #13 0x00005598c6863a49 in mongo::FTDCController::doLoop() ()
      #14 0x00005598c725bf60 in execute_native_thread_routine ()
      #15 0x00007f6ce0ae86aa in start_thread (arg=0x7f6cd9708700) at pthread_create.c:333
      #16 0x00007f6ce081e13d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

      4.0.2:

      #1  0x00005565dcdc1a78 in mongo::ClockSource::waitForConditionUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
      #2  0x00005565dcdb9313 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
      #3  0x00005565dcdb9860 in mongo::OperationContext::waitForConditionOrInterruptUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
      #4  0x00005565dc5eff81 in mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >) ()
      #5  0x00005565dc5f6ff2 in mongo::LockerImpl<false>::lockComplete(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t, bool) ()
      #6  0x00005565dc5e31d6 in mongo::Lock::GlobalLock::waitForLockUntil(mongo::Date_t) ()
      #7  0x00005565dc5e38e5 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, mongo::Date_t, mongo::Lock::InterruptBehavior) ()
      #8  0x00005565db563278 in mongo::WiredTigerServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const ()
      #9  0x00005565dc8cacfa in mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
      #10 0x00005565dc8d30dd in mongo::CommandHelpers::runCommandDirectly(mongo::OperationContext*, mongo::OpMsgRequest const&) ()
      #11 0x00005565db981d1d in mongo::FTDCSimpleInternalCommandCollector::collect(mongo::OperationContext*, mongo::BSONObjBuilder&) ()
      #12 0x00005565db9b9b85 in mongo::FTDCCollectorCollection::collect(mongo::Client*) ()
      #13 0x00005565db9be959 in mongo::FTDCController::doLoop() ()
      #14 0x00005565dcf78720 in execute_native_thread_routine ()
      #15 0x00007fec934e86aa in start_thread (arg=0x7fec87cf9700) at pthread_create.c:333
      #16 0x00007fec9321e13d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
      

            Assignee:
            xiangyu.yao@mongodb.com Xiangyu Yao (Inactive)
            Reporter:
            bruce.lucas@mongodb.com Bruce Lucas (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: