Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-35619

Potential deadlock in ShardRegistry::replicaSetChangeShardRegistryUpdateHook

    • Fully Compatible
    • ALL
    • Sharding 2018-07-16, Sharding 2018-07-30, Sharding 2018-08-13, Sharding 2018-09-24, Sharding 2020-03-09, Sharding 2020-03-23

      ShardRegistry::replicaSetChangeShardRegistryUpdateHook is registered in ReplicaSetMonitor::setSynchronousConfigChangeHook() which is explicitly documented as calling the hook while holding the RSM mutex. Unfortunately, that hook acquires ShardRegistyData::_mutex at https://github.com/mongodb/mongo/blob/82b62cf1e513657a0c35d757cf37eab0231ebc9b/src/mongo/s/client/shard_registry.cpp#L526.

       

      These mutexes are acquired in the other order in ShardRegistryData::toBSON() when it transitively calls into ReplicaSetMonitor::getServerAddress(), so this could result in a deadlock.

      Suggested Fix

      1. Split off the serverAddressLock in ReplicaSetMonitor so the lock that is eventually taken from ShardRegistry is not the same as taken in Refresher::_refreshUntilMatches
      2. Build the confirmed server address into a separate variable and update it when the seedNodes set in RSM is being updated as the confirmedServerAddress is built from seedNodes nodes.

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            mathias@mongodb.com Mathias Stearn
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated:
              Resolved: