Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-20058

mongos deadlock while replacing catalog manager

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 3.1.7
    • Affects Version/s: 3.1.7
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • Sharding 8 08/28/15

      The important stack trace from the hang analyzer is below. The thing to notice is the reentrancy to the catalog manager. Inside a catalog manager call, ShardConnection goes to refresh sharding metadata via the forwarding catalog manager. If the process detects that it needs to change the catalog manager in the inner operation, it fails to drop the lock on the outer operation, and so waits forever for the catalog manager to get changed out.

        mongo::ForwardingCatalogManager::waitForCatalogManagerChange() ()
        mongo::ForwardingCatalogManager::getAllShards(std::vector<mongo::ShardType, std::allocator<mongo::ShardType> >*) ()
        mongo::ShardRegistry::reload() ()
        mongo::ShardRegistry::getShard(std::string const&)
       
        mongo::(anonymous namespace)::checkShardVersion(mongo::OperationContext*, mongo::DBClientBase*, std::string const&, std::shared_ptr<mongo::ChunkManager>, bool, int) ()
        mongo::VersionManager::checkShardVersionCB(mongo::OperationContext*, mongo::ShardConnection*, bool, int) ()
        mongo::ShardConnection::_finishInit() ()
        mongo::ShardConnection::get() ()
        mongo::DBClientMultiCommand::sendAll() ()
        mongo::ConfigCoordinator::executeBatch(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
        mongo::CatalogManagerLegacy::writeConfigServerDirect(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
        mongo::ForwardingCatalogManager::writeConfigServerDirect(mongo::BatchedCommandRequest const&, mongo::BatchedCommandResponse*) ()
        mongo::CatalogManager::update(std::string const&, mongo::BSONObj const&, mongo::BSONObj const&, bool, bool, mongo::BatchedCommandResponse*) ()
        mongo::Balancer::_ping(mongo::OperationContext*, bool) ()
        mongo::Balancer::run() ()                                                            
        mongo::BackgroundJob::jobBody() ()
      

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            schwerin@mongodb.com Andy Schwerin
            Votes:
            0 Vote for this issue
            Watchers:
            47 Start watching this issue

              Created:
              Updated:
              Resolved: