Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-60161

Deadlock between config server stepdown and _configsvrRenameCollectionMetadata command

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.0.4, 5.1.0-rc0
    • Affects Version/s: 5.0.0
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • v5.0
    • Sharding EMEA 2021-10-04
    • 135

      The OperationContext for ShardingCatalogManager::renameShardedMetadata() has a logical session checked out while doing an uninterruptible wait on the _kChunkOpLock. If the _kChunkOpLock is currently held (e.g. from a running _configsvrSetAllowMigrations command), then _configsvrRenameCollectionMetadata will block until the _kChunkOpLock is released. In particular, the _configsvrSetAllowMigrations command will acquire the _kChunkOpLock and then attempt to acquire additional LockManager locks such as the RSTL IX lock. If a stepdown occurs on the primary, then the RstlKillOpThread interrupt the OperationContext running ShardingCatalogManager::renameShardedMetadata(). But the uninterruptible wait means that the no attention is given to the kill status. ReplicationCoordinatorImpl::_stepDownFinish() will then block attempting to check out the logical session to kill it as part of invalidateSessionsForStepdown() while holding the RSTL X lock.

      • _configsvrRenameCollectionMetadata (holding "logical session" resource) -> _kChunkOpLock
      • _configsvrSetAllowMigrations (holding _kChunkOpLock) -> RSTL IX lock
      • Stepdown (holding RSTL X lock) -> acquiring "logical session" resource

      I think the solution here would be to make the _kChunkOpLock and _kZoneOpLock acquisitions interruptible by using the 3-argument constructor for Lock::ExclusiveLock.

      Lock::ExclusiveLock chunkLk(opCtx, opCtx->lockState(), _kChunkOpLock);
      Lock::ExclusiveLock zoneLk(opCtx, opCtx->lockState(), _kZoneOpLock);
      

            Assignee:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Reporter:
            max.hirschhorn@mongodb.com Max Hirschhorn
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: