Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-66477

Deadlock during stepup when there is a prepared transaction and migration recipient recovery needs to be run

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 5.3.0, 6.0.0-rc0
    • Component/s: Sharding
    • None
    • ALL
    • Hide
      ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/repro-bf-25230.js --log=file
      
      Show
      ./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding jstests/sharding/repro-bf-25230.js --log=file
    • Sharding EMEA 2022-05-30
    • 153

      There exists a possible deadlock on stepup involving a particular interleaving of a transaction starting and becoming prepared (ii) and a stepdown during a chunk migration.

      Consider the following interleaving:
      1.A chunk migration recipient that has exited its critical section but not yet removed its recovery document.
      2. As soon as the critical section was released, a new prepared transaction could have started and reached the prepare state.
      3. The recipient primary down steps down. because the migration recipient recovery document still exists, the upcoming primary will need to recover it. Then, still under drain mode, this involves reacquiring the critical section, which requires taking the collection lock in MODE_S.
      However, because there was a prepared transaction (whose locks are reacquired earlier in the stepup sequence), the migration recovery won't be able to acquire the lock and will deadlock on stepup.

        1. 0001-Repro-BF-25230.patch
          4 kB
        2. 0001-BF-25230-fix.patch
          3 kB

            Assignee:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: