Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-38473

Replace uses of UninterruptibleLockGuard and MODE_X collection locks with uses of a database sharding state mutex for movePrimary functions

    • Fully Compatible
    • Sharding 2019-01-14, Sharding 2019-01-28, Sharding 2019-02-11

      We use UninterruptibleLockGuard in several places for movePrimaries. Since they acquire strong locks on normal databases and collections, they will be blocked by prepared transactions, causing deadlock on stepdown or shutdown.

      Here's a list of all the occurrences of UninterruptibleLockGuard for movePrimaries.

      We will employ a new DatabaseShardingStateLock (similar to the CollectionShardingRuntimeLock) to safeguard concurrent access to the database critical section.

      When leaving critical section no longer conflicts with prepared transactions, they can run after prepared transactions yield locks on stepdown or being killed on shutdown. See SERVER-38162 for the order of events on shutdown and SERVER-38282 for stepdown.

      The following function signature changes will allow us to relax the database locking to IX instead of X under the above UninterruptibleLockGuards. Database locks highlighted in bold will have a proposed change. All other database locks remain unchanged.

      • enterCriticalSectionCatchUpPhase: Database X Lock, DSSLock X Lock
      • enterCriticalSectionCommitPhase: Database X Lock, DSSLock X Lock
      • extitCriticalSection: Database X IX Lock, DSSLock X Lock
      • getCriticalSectionSignal: Database IS Lock, DSSLock IS Lock
      • getDbVersion: Database IS or X Lock (situational), DSS IS Lock
      • setDbVersion (when setting dbVersion to a meaningful value): Database X Lock, DSS X Lock
      • setDbVersion (when setting dbVersion to boost::none, AKA "clearing" the dbVersion): Database X IX Lock, DSS X
      • checkDbVersion: Database IS or X Lock (situational), DSS IS Lock
      • getMovePrimarySourceManager: No Database Lock (reflects current usage), DSS IS Lock
      • setMovePrimarySourceManager: Database X Lock, DSS X Lock
      • clearMovePrimarySourceManager: Database X IX Lock, DSS X Lock

      These situations can use relaxed locking as a result:

      1. On the source node for movePrimary, if we can't commit the moved primary, we check if the node has stepped down. If it has stepped down, we must set the database version to boost::none to indicate that the we now do not know the authoritative version. Since setting the database version to boost::none is semantically equivalent to "clearing" the database version, we can now use a Database IX Lock instead of a Database X Lock when doing so. The DSSLock in exclusive mode will prevent concurrent changes to the database version.
      2. On the source node for movePrimary, we must clear state variables on cleanup. This includes clearing the movePrimarySourceManager and criticalSection variables. We may now use a Database IX Lock instead of a Database X Lock, since the DSSLock in exclusive mode will prevent concurrent changes to the database sharding state's in-memory variables.

            Assignee:
            blake.oler@mongodb.com Blake Oler
            Reporter:
            siyuan.zhou@mongodb.com Siyuan Zhou
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: