Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-39337

MigrationSourceManager can hit an invariant if initial lock acquisition timed out

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.1.7
    • Fix Version/s: 4.1.9
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      1. Run fsyncLock on to be donor shard.
      2. Run moveChunk.
      3. Wait until what: "moveChunk.error" changelog shows up in the log. This indicates that the lock acquisition timed out.
      4. Run fsyncUnlock, this will allow the MSM::_cleanup to grab the collection lock, and then triggering the invariant.

      Show
      1. Run fsyncLock on to be donor shard. 2. Run moveChunk. 3. Wait until what: "moveChunk.error" changelog shows up in the log. This indicates that the lock acquisition timed out. 4. Run fsyncUnlock, this will allow the MSM::_cleanup to grab the collection lock, and then triggering the invariant.
    • Sprint:
      Sharding 2019-02-25, Sharding 2019-03-11
    • Linked BF Score:
      10

      Description

      MSM modifies the _state outside the collection lock and updates the decorator inside the lock. So, when _cleanup gets run it is possible to have _state != created and decorator to be nullptr.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                1 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: