Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
None

Assigned Teams:

Catalog and Routing
Operating System:
ALL
Sprint:
Execution EMEA Team 2023-10-02
Linked BF Score:
10
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

GlobalLock caches the result of its lock acquisition. Methods like isLocked reference this cached result. In the GlobalLock destructor we call isLocked to determine whether we need to abandon the current snapshot.

When we yield, we release locks using the lock manager but do not inform the global locks higher up in the stack of our release. During a yield the GlobalLock cached _result will falsely indicate a locked state. When a GlobalLock is destroyed during a yield, it can unsafely access the storage engine without a lock, causing a segfault.

Here is a patch demonstrating this issue. You can apply this diff to master and see that the lock is not actually held in some cases:

+++ b/src/mongo/db/concurrency/d_concurrency.cpp
@@ -175,6 +175,7 @@ Lock::GlobalLock::~GlobalLock() {
     auto* locker = _opCtx->lockState();
 
     if (isLocked()) {
+        invariant(_opCtx->lockState()->getLockMode(resourceIdGlobal) != LockMode::MODE_NONE);
         // Abandon our snapshot if destruction of the GlobalLock object results in actually
         // unlocking the global lock. Recursive locking and the two-phase locking protocol may
         // prevent lock release.

This was discovered in a 4.4 crash, BF-28945. We recovered a core dump where one thread is shutting down holding a global lock and destroying WT. The other thread is a GetMore command, running its GlobalLock destructor. The GetMore command segfaults while calling abandonSnapshot accessing the storage engine. It's very likely this failure is a result from this bug.

I am unsure of the severity of this bug because due to unfamiliarity with the code.

duplicates

SERVER-70338 Query yield accesses the storage engine without locks during shutdown and rollback

Closed

related to

SERVER-81333 Make the GlobalLock and DBLock RAII types aware of yields

Backlog

Assignee:: Josef Ahmad
Reporter:: Matt Boros
Participants:: Gregory Noma, Josef Ahmad, Matt Boros
Votes:: 0 Vote for this issue
Watchers:: 9 Start watching this issue

Created:: Aug 04 2023 07:49:28 PM UTC
Updated:: Mar 22 2024 02:45:55 PM UTC
Resolved:: Sep 22 2023 06:06:13 AM UTC
Confidence Status Last Update:: 18/Sep/23 10:26 AM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates