Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39096

Prepared transactions and DDL operations can deadlock on a secondary, if a reader blocks on a prepared document

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Won't Fix
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Replication
    • Labels:
      None
    • Operating System:
      ALL
    • Sprint:
      Storage NYC 2019-02-11
    • Linked BF Score:
      9

      Description

      When a transaction is prepared on secondary, a DBHash transaction on the same collection will have to wait for the prepared transaction to either commit or abort, while holding a global lock in IX mode. If a DDL operation comes in at this moment, the DDL operation will be blocked since it needs the global lock in X mode. There might be commit or abort command coming after the DDL in the oplog, however, they don't have a chance to apply.

      This isn't a problem on primary, since a concurrent commit / abort command can come in while DBHash is blocked, however secondaries serialize all operations. This won't be solved by acquiring weaker locks for DDL operations either, because DBHash may be holding a IX lock on one collection, but waiting for a prepared transaction in another collection. For example, there are two collection C1 and C2. The sequence of operations on a secondary is:

      1. (Oplog Application) A transaction with a single write in C1 is prepared.
      2. Read-only transactional DBHash comes in, holding the collection lock on C1 and C2. DBHash is waiting for the prepared transaction to commit or abort.
      3. (Oplog Application) A DDL on C2 is applied, waiting after the DBHash for collection lock.

      The prepared transaction and the DDL operation should both have succeeded on primary since they don't conflict.

      Only snapshot read on secondary causes this problem since they cannot yield. There might be a way to change DBHash to use a special snapshot isolation mechanism instead of read-only transaction. Locking, prepare-waiting and DDL operations (e.g. dropCollection) have to be considered when designing such an approach. In the short term, we perhaps need to disable read-only transaction on secondaries.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                9 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: