[SERVER-56281] Investigate data race between curOp and FTDC in LockStats Created: 22/Apr/21  Updated: 17/Sep/21  Resolved: 30/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Henrik Edin Assignee: Henrik Edin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to SERVER-60049 Investigate using AtomicLockStats ove... Closed
Operating System: ALL
Sprint: Execution Team 2021-05-17, Execution Team 2021-05-31, Execution Team 2021-07-12, Execution Team 2021-07-26, Execution Team 2021-08-09
Participants:
Linked BF Score: 0

 Description   

From linked ticket:

[j1] WARNING: ThreadSanitizer: data race (pid=12709)

[j1]   Write of size 8 at 0x7b6c0007fec8 by thread T12:
[j1]     #0 mongo::CounterOps::add(long&, long) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_stats.h:62:17 (mongod+0x3914e64)
[j1]     #1 mongo::LockStats<long>::recordAcquisition(mongo::ResourceId, mongo::LockMode) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_stats.h:128 (mongod+0x3914e64)
[j1]     #2 mongo::LockerImpl::_lockBegin(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_state.cpp:855 (mongod+0x3914e64)
[j1]     #3 mongo::LockerImpl::lock(mongo::OperationContext*, mongo::ResourceId, mongo::LockMode, mongo::Date_t) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_state.cpp:528:31 (mongod+0x391790b)
[j1]     #4 mongo::Lock::GlobalLock::_takeGlobalAndRSTLLocks(mongo::LockMode, mongo::Date_t) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/d_concurrency.cpp:186:26 (mongod+0x390513f)
[j1]     #5 mongo::Lock::GlobalLock::GlobalLock(mongo::OperationContext*, mongo::LockMode, mongo::Date_t, mongo::Lock::InterruptBehavior, bool) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/d_concurrency.cpp:167 (mongod+0x390513f)
[j1]     #6 mongo::ephemeral_for_test::ServerStatusSection::generateSection(mongo::OperationContext*, mongo::BSONElement const&) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/storage/ephemeral_for_test/ephemeral_for_test_server_status.cpp:55:22 (mongod+0x12e9aa0)
[j1]     #7 mongo::ServerStatusSection::appendSection(mongo::OperationContext*, mongo::BSONElement const&, mongo::BSONObjBuilder*) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/server_status.h:100:26 (mongod+0x12cd83b)
[j1]     #8 mongo::CmdServerStatus::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/server_status.cpp:127:22 (mongod+0x379f6b5)
[j1]     #9 mongo::BasicCommand::runWithReplyBuilder(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::rpc::ReplyBuilderInterface*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands.h:939:16 (mongod+0x136751c)
[j1]     #10 mongo::BasicCommandWithReplyBuilderInterface::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands.cpp:852:29 (mongod+0x37c069d)
[j1]     #11 mongo::CommandHelpers::runCommandDirectly(mongo::OperationContext*, mongo::OpMsgRequest const&) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands.cpp:154:21 (mongod+0x37ac1a7)
[j1]     #12 mongo::FTDCServerStatusCommandCollector::collect(mongo::OperationContext*, mongo::BSONObjBuilder&) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/ftdc/ftdc_server.cpp:241:23 (mongod+0x196db8b)
[j1]     #13 mongo::FTDCCollectorCollection::collect(mongo::Client*) /data/mci/bff03bc3ef59ce3b0d49638acf862c0c/src/src/mongo/db/ftdc/collector.cpp:92:20 (mongod+0x19b69c9)
[j1]     #14 mongo::FTDCController::doLoop() /data/mci/bff03bc3ef59ce3b0d49638acf862c0c/src/src/mongo/db/ftdc/controller.cpp:249:54 (mongod+0x19bbfb4)

[j1]   Previous read of size 8 at 0x7b6c0007fec8 by thread T9 (mutexes: write M492435077276487480, write M889596492753405736, write M627824626973737032):
[j1]     #0 void mongo::LockStatCounters<long>::append<long>(mongo::LockStatCounters<long> const&) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_stats.h:85:48 (mongod+0x391d261)
[j1]     #1 void mongo::LockStats<long>::append<long>(mongo::LockStats<long> const&) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_stats.h:156 (mongod+0x391d261)
[j1]     #2 mongo::LockerImpl::getLockerInfo(mongo::Locker::LockerInfo*, boost::optional<mongo::LockStats<long> >) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_state.cpp:710:23 (mongod+0x3918864)
[j1]     #3 mongo::LockerImpl::getLockerInfo(boost::optional<mongo::LockStats<long> >) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/concurrency/lock_state.cpp:723:5 (mongod+0x3918967)
[j1]     #4 mongo::CommonMongodProcessInterface::_reportCurrentOpForClient(mongo::OperationContext*, mongo::Client*, mongo::MongoProcessInterface::CurrentOpTruncateMode, mongo::MongoProcessInterface::CurrentOpBacktraceMode) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/process_interface/common_mongod_process_interface.cpp:479:57 (mongod+0x27fd07e)
[j1]     #5 mongo::CommonProcessInterface::getCurrentOps(boost::intrusive_ptr<mongo::ExpressionContext> const&, mongo::MongoProcessInterface::CurrentOpConnectionsMode, mongo::MongoProcessInterface::CurrentOpSessionsMode, mongo::MongoProcessInterface::CurrentOpUserMode, mongo::MongoProcessInterface::CurrentOpTruncateMode, mongo::MongoProcessInterface::CurrentOpCursorMode, mongo::MongoProcessInterface::CurrentOpBacktraceMode) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/process_interface/common_process_interface.cpp:96:26 (mongod+0x2829baa)
[j1]     #6 mongo::DocumentSourceCurrentOp::doGetNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/document_source_current_op.cpp:114:48 (mongod+0x35654ee)
[j1]     #7 mongo::DocumentSource::getNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/document_source.h:266:20 (mongod+0x192b1e3)
[j1]     #8 mongo::DocumentSourceGroup::initialize() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/document_source_group.cpp:546:36 (mongod+0x359f962)
[j1]     #9 mongo::DocumentSourceGroup::doGetNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/document_source_group.cpp:173:43 (mongod+0x359f7b1)
[j1]     #10 mongo::DocumentSource::getNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/document_source.h:266:20 (mongod+0x192b1e3)
[j1]     #11 mongo::Pipeline::getNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/pipeline.cpp:448:40 (mongod+0x367ca60)
[j1]     #12 mongo::PlanExecutorPipeline::_getNext() /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/plan_executor_pipeline.cpp:119:31 (mongod+0x2dad7ad)
[j1]     #13 mongo::PlanExecutorPipeline::getNextDocument(mongo::Document*, mongo::RecordId*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/plan_executor_pipeline.cpp:101:21 (mongod+0x2dad2ba)
[j1]     #14 mongo::PlanExecutorPipeline::getNext(mongo::BSONObj*, mongo::RecordId*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/pipeline/plan_executor_pipeline.cpp:81 (mongod+0x2dad2ba)
[j1]     #15 mongo::(anonymous namespace)::handleCursorCommand(mongo::OperationContext*, boost::intrusive_ptr<mongo::ExpressionContext>, mongo::NamespaceString const&, std::vector<mongo::ClientCursor*, std::allocator<mongo::ClientCursor*> >, mongo::AggregateCommand const&, mongo::BSONObj const&, mongo::rpc::ReplyBuilderInterface*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/run_aggregate.cpp:185:27 (mongod+0x266b222)
[j1]     #16 mongo::runAggregate(mongo::OperationContext*, mongo::NamespaceString const&, mongo::AggregateCommand const&, mongo::LiteParsedPipeline const&, mongo::BSONObj const&, std::vector<mongo::Privilege, std::allocator<mongo::Privilege> > const&, mongo::rpc::ReplyBuilderInterface*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/run_aggregate.cpp:883 (mongod+0x266b222)
[j1]     #17 mongo::runAggregate(mongo::OperationContext*, mongo::NamespaceString const&, mongo::AggregateCommand const&, mongo::BSONObj const&, std::vector<mongo::Privilege, std::allocator<mongo::Privilege> > const&, mongo::rpc::ReplyBuilderInterface*) /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/run_aggregate.cpp:543:12 (mongod+0x266624c)
[j1]     #18 mongo::CurrentOpCommand::runAggregation(mongo::OperationContext*, mongo::AggregateCommand const&) const /data/mci/2c4620ccdb5f3dab379b45edc08d4983/src/src/mongo/db/commands/current_op.cpp:80:23 (mongod+0x25f8ac2)



 Comments   
Comment by Henrik Edin [ 30/Jul/21 ]

After talking to mark.benvenuto FTDC should not hold the client lock. An alternative would be to take in in ServerStatusSection or CommonMongodProcessInterface but we run into the problem of reentrancy and I don't want to make the locks more complicated for this.

I think we should leave this as-is, CurOp would return garbage for this variable if this data race is hit. I don't want to disable TSAN for the locker or everything under FTDC either.

Comment by Connie Chen [ 29/Apr/21 ]

henrik.edin will check with Security Team to see if they own this

Generated at Thu Feb 08 05:38:51 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.