Potential circular resource dependency between ReplicationCoordinatorImpl and InitialSyncer

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Minor - P4
    • 4.3.3
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • Repl 2020-01-13, Repl 2020-01-27
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None

      As a result of analysis from our hierarchical locking project, we have found that there appears to be a circular dependency between ReplicationCoordinatorImpl and InitialSyncer. It appears that ReplicationCoordinatorImpl can call InitialSyncer::isActive() to check state, but otherwise the InitialSyncer is expected to have control over the components of the ReplicationCoordinatorImpl. This can be trivially resolved by copying _initialSyncer to the stack under lock here and invoking _initialSyncer->isActive() out of lock. I don't think this is an especially worrisome cycle.

      These two stacks show the two underlying mutexes taken in opposing orders:

      "InitialSyncer has priority"
      ...
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.cpp:169:15: mongo::fassertFailedWithStatusWithLocation(int, mongo::Status const&, char const*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.h:289:44: mongo::fassertWithLocation(int, mongo::Status const&, char const*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/latch_analyzer.cpp:198:13: mongo::LatchAnalyzer::onAcquire(mongo::latch_detail::Identity const&) (.cold.925)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/platform/mutex.cpp:98:30: mongo::Mutex::_onQuickLock()
       /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.p5v/include/c++/8.2.0/bits/std_mutex.h:162:9: std::lock_guard<mongo::Latch>::lock_guard(mongo::Latch&)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:1237:40: mongo::repl::ReplicationCoordinatorImpl::getMyLastAppliedOpTimeAndWallTime() const
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_external_state_impl.cpp:958:91: mongo::repl::ReplicationCoordinatorExternalStateImpl::getToken()
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp:303:63: mongo::WiredTigerSessionCache::waitUntilDurable(mongo::OperationContext*, bool, bool)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/storage/wiredtiger/wiredtiger_recovery_unit.cpp:252:36: mongo::WiredTigerRecoveryUnit::waitUntilDurable(mongo::OperationContext*)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_consistency_markers_impl.cpp:142:44: mongo::repl::ReplicationConsistencyMarkersImpl::setInitialSyncFlag(mongo::OperationContext*)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:420:69: mongo::repl::InitialSyncer::_setUp_inlock(mongo::OperationContext*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:251:18: mongo::repl::InitialSyncer::startup(mongo::OperationContext*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:741:9: mongo::repl::ReplicationCoordinatorImpl::_startDataReplication(mongo::OperationContext*, std::function<void ()>)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl_heartbeat.cpp:566:30: mongo::repl::ReplicationCoordinatorImpl::_heartbeatReconfigStore(mongo::executor::TaskExecutor::CallbackArgs const&, mongo::repl::ReplSetConfig const&)
      ...
      
      "ReplicationCoordinatorImpl has priority"
      ...
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.cpp:169:15: mongo::fassertFailedWithStatusWithLocation(int, mongo::Status const&, char const*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/assert_util.h:289:44: mongo::fassertWithLocation(int, mongo::Status const&, char const*, unsigned int)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/util/latch_analyzer.cpp:198:13: mongo::LatchAnalyzer::onAcquire(mongo::latch_detail::Identity const&) (.cold.925)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/platform/mutex.cpp:98:30: mongo::Mutex::_onQuickLock()
       /opt/mongodbtoolchain/revisions/94dac13bc8c0b50beff286acac77adeb2e81761e/stow/gcc-v3.p5v/include/c++/8.2.0/bits/std_mutex.h:162:9: std::lock_guard<mongo::Latch>::lock_guard(mongo::Latch&)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/initial_syncer.cpp:225:40: mongo::repl::InitialSyncer::isActive() const
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/replication_coordinator_impl.cpp:2570:79: mongo::repl::ReplicationCoordinatorImpl::processReplSetSyncFrom(mongo::OperationContext*, mongo::HostAndPort const&, mongo::BSONObjBuilder*)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/repl/repl_set_commands.cpp:591:9: mongo::repl::CmdReplSetSyncFrom::run(mongo::OperationContext*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, mongo::BSONObjBuilder&)
       /home/ben/git/mongodb/mongo/worktrees/ben_idfl/src/mongo/db/commands.cpp:629:32: mongo::BasicCommand::Invocation::run(mongo::OperationContext*, mongo::rpc::ReplyBuilderInterface*)
      ...
      

            Assignee:
            Matthew Russotto
            Reporter:
            Benjamin Caimano (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: