[SERVER-45115] Potential circular resource dependency between ReplicationCoordinatorImpl and InitialSyncer Created: 12/Dec/19  Updated: 29/Oct/23  Resolved: 15/Jan/20

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.3.3

Type: Bug Priority: Minor - P4
Reporter: Benjamin Caimano (Inactive) Assignee: Matthew Russotto
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Backwards Compatibility: Fully Compatible
Sprint: Repl 2020-01-13, Repl 2020-01-27
Participants:

 Description   

As a result of analysis from our hierarchical locking project, we have found that there appears to be a circular dependency between ReplicationCoordinatorImpl and InitialSyncer. It appears that ReplicationCoordinatorImpl can call InitialSyncer::isActive() to check state, but otherwise the InitialSyncer is expected to have control over the components of the ReplicationCoordinatorImpl. This can be trivially resolved by copying _initialSyncer to the stack under lock here and invoking _initialSyncer->isActive() out of lock. I don't think this is an especially worrisome cycle.

These two stacks show the two underlying mutexes taken in opposing orders:

"InitialSyncer has priority"

Show all

...
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.cpp:169:15: mongo::fassertFailedWithStatusWithLocation(int, mongo::Status const&, char const*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.h:289:44: mongo::fassertWithLocation(int, mongo::Status const&, char const*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/latch_analyzer.cpp:198:13: mongo::LatchAnalyzer::onAcquire(mongo::latch_detail::Identity const&) (.cold.925)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/platform/mutex.cpp:98:30: mongo::Mutex::_onQuickLock()
 /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.p5v/include/c++/8.2.0/bits/std_mutex.h:162:9: std::lock_guard<mongo::Latch>::lock_guard(mongo::Latch&)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:1237:40: mongo::repl::ReplicationCoordinatorImpl::getMyLastAppliedOpTimeAndWallTime() const
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:958:91: mongo::repl::ReplicationCoordinatorExternalStateImpl::getToken()
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:303:63: mongo::WiredTigerSessionCache::waitUntilDurable(mongo::OperationContext*, bool, bool)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:252:36: mongo::WiredTigerRecoveryUnit::waitUntilDurable(mongo::OperationContext*)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_consistency_markers_impl.cpp:142:44: mongo::repl::ReplicationConsistencyMarkersImpl::setInitialSyncFlag(mongo::OperationContext*)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:420:69: mongo::repl::InitialSyncer::_setUp_inlock(mongo::OperationContext*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:251:18: mongo::repl::InitialSyncer::startup(mongo::OperationContext*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:741:9: mongo::repl::ReplicationCoordinatorImpl::_startDataReplication(mongo::OperationContext*, std::function<void ()>)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp:566:30: mongo::repl::ReplicationCoordinatorImpl::_heartbeatReconfigStore(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::repl::ReplSetConfig const&)
...

"ReplicationCoordinatorImpl has priority"

Show all

...
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.cpp:169:15: mongo::fassertFailedWithStatusWithLocation(int, mongo::Status const&, char const*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.h:289:44: mongo::fassertWithLocation(int, mongo::Status const&, char const*, unsigned int)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/latch_analyzer.cpp:198:13: mongo::LatchAnalyzer::onAcquire(mongo::latch_detail::Identity const&) (.cold.925)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/platform/mutex.cpp:98:30: mongo::Mutex::_onQuickLock()
 /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.p5v/include/c++/8.2.0/bits/std_mutex.h:162:9: std::lock_guard<mongo::Latch>::lock_guard(mongo::Latch&)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:225:40: mongo::repl::InitialSyncer::isActive() const
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:2570:79: mongo::repl::ReplicationCoordinatorImpl::processReplSetSyncFrom(mongo::OperationContext*, mongo::HostAndPort const&, mongo::BSONObjBuilder*)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/repl_set_commands.cpp:591:9: mongo::repl::CmdReplSetSyncFrom::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&)
 /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/commands.cpp:629:32: mongo::BasicCommand::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)
...



 Comments   
Comment by Githook User [ 14/Jan/20 ]

Author:

{'name': 'Matthew Russotto', 'email': 'matthew.russotto@mongodb.com', 'username': 'mtrussotto'}

Message: SERVER-45115 Eliminate circular resource dependency between ReplicationCoordinatorImpl and InitialSyncer.
Branch: master
https://github.com/mongodb/mongo/commit/a783d0069915849552117c3f3b485010aef7ab44

Generated at Thu Feb 08 05:07:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.