[SERVER-37313] FTDC collection blocked during foreground index build on secondary Created: 25/Sep/18  Updated: 29/Oct/23  Resolved: 23/Oct/18

Status: Closed
Project: Core Server
Component/s: Diagnostics, Storage
Affects Version/s: 3.6.8, 4.0.2
Fix Version/s: 3.6.9, 4.0.5, 4.1.5

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Xiangyu Yao (Inactive)
Resolution: Fixed Votes: 0
Labels: SWDI
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-38748 Background indexes created through ap... Closed
Related
related to SERVER-37199 Yield locks of transactions in second... Closed
related to SERVER-37930 Add test coverage for createIndexes i... Closed
is related to SERVER-26005 FTDC shouldn't conflict with secondar... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6
Sprint: Storage NYC 2018-10-22, Storage NYC 2018-11-05
Participants:
Linked BF Score: 56

 Description   

Repro: create a collection, do a foreground index build. (Issue does not affect background index builds.) When it finishes on the primary and is replicated to secondaries, FTDC data collection will be blocked for the duration of the index build on the secondary. This is a regression in 3.6. and 4.0.

This is problematic because index builds can stress system resources and so FTDC may be particularly important for diagnosing problems during index builds. We implemented SERVER-26005 to avoid this in 3.4. I'm not sure though why this has regressed in 3.6 and 4.0 - as far as I can see SERVER-26005 is still in place in 3.6 and should be irrelevant in 4.0.

Stack traces show we are stalled acquiring a global lock in WiredTigerServerStatusSection::generateSection. Looks like this is a MODE_IS global lock, so I'm not sure what it's conflicting with.

3.6.8:

#1  0x00005598c6a76528 in mongo::CondVarLockGrantNotification::wait(mongo::Duration<std::ratio<1l, 1000l> >) ()
#2  0x00005598c6a7b0ee in mongo::LockerImpl<false>::lockComplete(mongo::ResourceId, mongo::LockMode, mongo::Duration<std::ratio<1l, 1000l> >, bool) ()
#3  0x00005598c6a695cf in mongo::Lock::GlobalLock::waitForLock(unsigned int) ()
#4  0x00005598c6a69c55 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, unsigned int) ()
#5  0x00005598c5936fcc in mongo::WiredTigerServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const ()
#6  0x00005598c59070ea in mongo::ServerStatusSection::appendSection(mongo::OperationContext*, mongo::BSONElement const&, mongo::BSONObjBuilder*) const ()
#7  0x00005598c6a3760f in mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#8  0x00005598c6a86af6 in mongo::BasicCommand::enhancedRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#9  0x00005598c6a81d4f in mongo::Command::publicRun(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::BSONObjBuilder&) ()
#10 0x00005598c6a84bf6 in mongo::Command::runCommandDirectly(mongo::OperationContext*, mongo::OpMsgRequest const&) ()
#11 0x00005598c6850fdd in mongo::FTDCSimpleInternalCommandCollector::collect(mongo::OperationContext*, mongo::BSONObjBuilder&) ()
#12 0x00005598c685f713 in mongo::FTDCCollectorCollection::collect(mongo::Client*) ()
#13 0x00005598c6863a49 in mongo::FTDCController::doLoop() ()
#14 0x00005598c725bf60 in execute_native_thread_routine ()
#15 0x00007f6ce0ae86aa in start_thread (arg=0x7f6cd9708700) at pthread_create.c:333
#16 0x00007f6ce081e13d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

4.0.2:

#1  0x00005565dcdc1a78 in mongo::ClockSource::waitForConditionUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
#2  0x00005565dcdb9313 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
#3  0x00005565dcdb9860 in mongo::OperationContext::waitForConditionOrInterruptUntil(std::condition_variable&, std::unique_lock<std::mutex>&, mongo::Date_t) ()
#4  0x00005565dc5eff81 in mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >) ()
#5  0x00005565dc5f6ff2 in mongo::LockerImpl<false>::lockComplete(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t, bool) ()
#6  0x00005565dc5e31d6 in mongo::Lock::GlobalLock::waitForLockUntil(mongo::Date_t) ()
#7  0x00005565dc5e38e5 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, mongo::Date_t, mongo::Lock::InterruptBehavior) ()
#8  0x00005565db563278 in mongo::WiredTigerServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const ()
#9  0x00005565dc8cacfa in mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#10 0x00005565dc8d30dd in mongo::CommandHelpers::runCommandDirectly(mongo::OperationContext*, mongo::OpMsgRequest const&) ()
#11 0x00005565db981d1d in mongo::FTDCSimpleInternalCommandCollector::collect(mongo::OperationContext*, mongo::BSONObjBuilder&) ()
#12 0x00005565db9b9b85 in mongo::FTDCCollectorCollection::collect(mongo::Client*) ()
#13 0x00005565db9be959 in mongo::FTDCController::doLoop() ()
#14 0x00005565dcf78720 in execute_native_thread_routine ()
#15 0x00007fec934e86aa in start_thread (arg=0x7fec87cf9700) at pthread_create.c:333
#16 0x00007fec9321e13d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109



 Comments   
Comment by Githook User [ 10/Nov/18 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: SERVER-37313 Secondary foreground index build should take Database X rather than Global X lock

(cherry picked from commit 167861a164723168adfaaa866f310cb94010428f)
Branch: v4.0
https://github.com/mongodb/mongo/commit/8b1c73093df76fff074e68ad945ad6070fcc0df6

Comment by Githook User [ 05/Nov/18 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: SERVER-37313 Secondary foreground index build should take Database X rather than Global X lock
Branch: v3.6
https://github.com/mongodb/mongo/commit/167861a164723168adfaaa866f310cb94010428f

Comment by Githook User [ 25/Oct/18 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: Revert "SERVER-37313 Secondary foreground index build should take Database X rather than Global X lock"

This reverts commit 3aebbb51ed5fe82601f601916251c5f892a59467.
Branch: v4.0
https://github.com/mongodb/mongo/commit/dd4c4c1043f3e20618e536f96d4237691bfedd2f

Comment by Githook User [ 25/Oct/18 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: SERVER-37313 Secondary foreground index build should take Database X rather than Global X lock

(cherry picked from commit e7bed9bdcb376d5a06dce6228047309e8481f9cf)
Branch: v4.0
https://github.com/mongodb/mongo/commit/3aebbb51ed5fe82601f601916251c5f892a59467

Comment by Githook User [ 23/Oct/18 ]

Author:

{'name': 'Xiangyu Yao', 'email': 'xiangyu.yao@mongodb.com', 'username': 'xy24'}

Message: SERVER-37313 Secondary foreground index build should take Database X rather than Global X lock
Branch: master
https://github.com/mongodb/mongo/commit/e7bed9bdcb376d5a06dce6228047309e8481f9cf

Generated at Thu Feb 08 04:45:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.