Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-79761

GlobalLock can segfault due to not actually holding lock

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Storage Execution EMEA
    • ALL
    • Execution EMEA Team 2023-10-02
    • 10

      GlobalLock caches the result of its lock acquisition. Methods like isLocked reference this cached result. In the GlobalLock destructor we call isLocked to determine whether we need to abandon the current snapshot.

      When we yield, we release locks using the lock manager but do not inform the global locks higher up in the stack of our release. During a yield the GlobalLock cached _result will falsely indicate a locked state. When a GlobalLock is destroyed during a yield, it can unsafely access the storage engine without a lock, causing a segfault.

      Here is a patch demonstrating this issue. You can apply this diff to master and see that the lock is not actually held in some cases:

      +++ b/src/mongo/db/concurrency/d_concurrency.cpp
      @@ -175,6 +175,7 @@ Lock::GlobalLock::~GlobalLock() {
           auto* locker = _opCtx->lockState();
           if (isLocked()) {
      +        invariant(_opCtx->lockState()->getLockMode(resourceIdGlobal) != LockMode::MODE_NONE);
               // Abandon our snapshot if destruction of the GlobalLock object results in actually
               // unlocking the global lock. Recursive locking and the two-phase locking protocol may
               // prevent lock release.

      This was discovered in a 4.4 crash, BF-28945. We recovered a core dump where one thread is shutting down holding a global lock and destroying WT. The other thread is a GetMore command, running its GlobalLock destructor. The GetMore command segfaults while calling abandonSnapshot accessing the storage engine. It's very likely this failure is a result from this bug.

      I am unsure of the severity of this bug because due to unfamiliarity with the code.

            josef.ahmad@mongodb.com Josef Ahmad
            matt.boros@mongodb.com Matt Boros
            0 Vote for this issue
            9 Start watching this issue