Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75285

Deadlock between ShardsvrCheckMetadataConsistencyParticipantCommand, prepared transactions, and stepdown

    • Type: Icon: Bug Bug
    • Resolution: Gone away
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Sharding EMEA
    • ALL
    • 135

      ShardsvrCheckMetadataConsistencyParticipantCommand currently takes a DB lock in S mode IS mode without exempting taking the RSTL. This means that it will not be killed on stepdown (since it didn't take the global lock in a mode that conflicts with writes).

      (Edit: at the time that this deadlock was found, the command took the DB lock in S mode).

      This can then cause a deadlock with prepared transactions if the transaction is holding the DB lock that checkMetadataConsistency is looking to acquire, but committing the transaction is blocked on a stepdown (as in the node isn't able to replicate the commitTransaction command until it completes stepping down).

      The order of events is:
      1. Prepare a transaction that holds the DB lock in IX for some db that checkMetadataConsistency might need to take a DB lock for
      2. ShardsvrCheckMetadataConsistencyParticipantCommand tries to take the DB lock for the db mentioned above, ends up holding the RSTL in IX mode while it waits
      3. Node tries to step down before it receives the commitTransaction command

      A targeted way to fix this would be to manually ensure that checkMetadataConsistency is killed by the stepdown thread or make sure it does not hold the RSTL.

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            samy.lanka@mongodb.com Samyukta Lanka
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: