Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-63071

[Retryability] Prepared internal transactions for retryable findAndModify can cause stepup to hang

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 5.3.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Fully Compatible
    • ALL
    • Sharding 2022-02-07

      Consider a shard with two nodes: n0 (primary) and n1 (secondary). We run a findAndModify command with pre/post image in a retryable internal transaction on the shard, and then we run a prepareTransaction command. As part of preparing the transaction, n0 writes an applyOps oplog entry and a config.image_collection entry for the transaction. The writes are done in a separate RecoveryUnit (i.e. storage transaction) from the one for the transaction. As part of applying the applyOps oplog entry for the transaction, n1 also writes a config.image_collection entry for it. However, the write is done in the same RecoveryUnit as the one for the transaction. As a result, the config.image_collection IX lock does not get released when the write completes (i.e. it is held along with other locks acquired for the transaction until the transaction commits or aborts). While the transaction is in prepare, n0 steps down. There are two cases:

      • If n0 steps up, the stepup hook would not hang since the config.image_collection IX lock is not being held by the prepared transaction.

            Assignee:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Reporter:
            cheahuychou.mao@mongodb.com Cheahuychou Mao
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: