Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-28427

GlobalLock with timeout can still block indefinitely

    • Type: Icon: Bug Bug
    • Resolution: Done
    • Priority: Icon: Critical - P2 Critical - P2
    • 3.4.5, 3.5.7
    • Affects Version/s: 3.4.2, 3.5.4
    • Component/s: Storage
    • Labels:
    • Fully Compatible
    • ALL
    • v3.4
    • Storage 2017-04-17, Storage 2017-05-08

      There is a potential dead lock between the step down command and the noop writer. The step down command takes the global exclusive lock in S mode and then blocks on destroying the noop writer.

      The noop writer takes the global exclusive lock in IX mode when it does writes. The destructor calls join which won't return until the noop writer finishes its write.

      To fix this we can:
      1. stop the noop writer's write in killAllUserOperations before we try to shut it down.
      2. Stop the noop writer before we take the global lock and start it back up again if we fail to step down.
      3. mark the operation context as killed in the noop writer destructor so that it stops trying to take the lock.

            geert.bosch@mongodb.com Geert Bosch
            judah.schvimer@mongodb.com Judah Schvimer
            0 Vote for this issue
            13 Start watching this issue