Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28427

GlobalLock with timeout can still block indefinitely

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 3.4.2, 3.5.4
    • Fix Version/s: 3.4.5, 3.5.7
    • Component/s: Storage
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Sprint:
      Storage 2017-04-17, Storage 2017-05-08
    • Case:

      Description

      There is a potential dead lock between the step down command and the noop writer. The step down command takes the global exclusive lock in S mode and then blocks on destroying the noop writer.

      The noop writer takes the global exclusive lock in IX mode when it does writes. The destructor calls join which won't return until the noop writer finishes its write.

      To fix this we can:
      1. stop the noop writer's write in killAllUserOperations before we try to shut it down.
      2. Stop the noop writer before we take the global lock and start it back up again if we fail to step down.
      3. mark the operation context as killed in the noop writer destructor so that it stops trying to take the lock.

        Attachments

          Activity

            People

            • Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: