[SERVER-59478] Move serverStatus command before taking RSTL in catchup_takeover_with_higher_config.js Created: 20/Aug/21  Updated: 29/Oct/23  Resolved: 20/Aug/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 5.0.3, 4.4.9, 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Wenbin Zhu Assignee: Wenbin Zhu
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v5.0, v4.4
Sprint: Repl 2021-08-23
Participants:
Linked BF Score: 48

 Description   

When executing this serverStatus command,  by default it outputs WiredTiger information, which takes a GlobalLock (involves taking RSTL), but the RSTL is already taken by the stepup thread that was hung by a failpoint. Normally since the global lock acquisition by the serverStatus command has a deadline that is set to Date_t::now(), the lock acquisition should fail fail quickly if it cannot acquire RSTL. However it seems that sometimes this lock acquisition can be blocked (maybe due to faulty system clock that affects the lock waiting implementation), thus hanging the test:

#0  0x00007f56a67047e1 in poll () from /lib64/libc.so.6
#1  0x0000562dd02fc3b3 in mongo::transport::TransportLayerASIO::BatonASIO::run(mongo::ClockSource*) ()
#2  0x0000562dd02e646d in mongo::transport::TransportLayerASIO::BatonASIO::run_until(mongo::ClockSource*, mongo::Date_t) ()
#3  0x0000562dd07dc9e1 in mongo::ClockSource::waitForConditionUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t, mongo::Waitable*) ()
#4  0x0000562dd07d0600 in mongo::OperationContext::waitForConditionOrInterruptNoAssertUntil(mongo::stdx::condition_variable&, mongo::BasicLockableAdapter, mongo::Date_t) ()
#5  0x0000562dd0783ea5 in mongo::Interruptible::waitForConditionOrInterruptUntil<std::unique_lock<mongo::latch_detail::Latch>, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::{lambda()#1}>(mongo::stdx::condition_variable&, std::unique_lock<mongo::latch_detail::Latch>&, mongo::Date_t, mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >)::{lambda()#1}, mongo::AtomicWord<long>*)::{lambda(auto:1&, mongo::Interruptible::WakeSpeed)#3}::operator()(std::unique_lock<mongo::latch_detail::Latch>&, mongo::AtomicWord<long>*) const ()
#6  0x0000562dd078453c in mongo::CondVarLockGrantNotification::wait(mongo::OperationContext*, mongo::Duration<std::ratio<1l, 1000l> >) ()
#7  0x0000562dd07863e6 in mongo::LockerImpl::_lockComplete(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t) ()
#8  0x0000562dd0778978 in mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, mongo::Date_t, mongo::Lock::InterruptBehavior) ()
#9  0x0000562dcecc562b in mongo::WiredTigerServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const ()
#10 0x0000562dcea63429 in mongo::ServerStatusSection::appendSection(mongo::OperationContext*, mongo::BSONElement const&, mongo::BSONObjBuilder*) const ()
#11 0x0000562dcf66bfca in mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) ()
#12 0x0000562dcf84229a in mongo::BasicCommandWithReplyBuilderInterface::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) ()
#13 0x0000562dcf83cabf in mongo::CommandHelpers::runCommandInvocation(mongo::OperationContext*, mongo::OpMsgRequest const&, mongo::CommandInvocation*, mongo::rpc::ReplyBuilderInterface*) ()

To mitigate this, we can move the serverStatus before taking RSTL, to avoid any such cases.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 26/Aug/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59478 Move serverStatus command before taking RSTL in catchup_takeover_with_higher_config.js.

(cherry picked from commit d9747e5fbfc820ca6ea9167d4361d83cd2f507c6)
Branch: v4.4
https://github.com/mongodb/mongo/commit/08935a289f0a7216cfe9f606e1ec6a4e25ff8e95

Comment by Githook User [ 26/Aug/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59478 Move serverStatus command before taking RSTL in catchup_takeover_with_higher_config.js.

(cherry picked from commit d9747e5fbfc820ca6ea9167d4361d83cd2f507c6)
Branch: v5.0
https://github.com/mongodb/mongo/commit/fa47f037feca94fd8385a15dd1fc3a23e648d880

Comment by Githook User [ 20/Aug/21 ]

Author:

{'name': 'Wenbin Zhu', 'email': 'wenbin.zhu@mongodb.com', 'username': 'WenbinZhu'}

Message: SERVER-59478 Move serverStatus command before taking RSTL in catchup_takeover_with_higher_config.js.
Branch: master
https://github.com/mongodb/mongo/commit/d9747e5fbfc820ca6ea9167d4361d83cd2f507c6

Generated at Thu Feb 08 05:47:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.