[SERVER-79761] GlobalLock can segfault due to not actually holding lock Created: 04/Aug/23 Updated: 22/Sep/23 Resolved: 22/Sep/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matt Boros | Assignee: | Josef Ahmad |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Storage Execution EMEA
|
||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Sprint: | Execution EMEA Team 2023-10-02 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 10 | ||||||||||||||||||||
| Description |
|
GlobalLock caches the result of its lock acquisition. Methods like isLocked reference this cached result. In the GlobalLock destructor we call isLocked to determine whether we need to abandon the current snapshot. When we yield, we release locks using the lock manager but do not inform the global locks higher up in the stack of our release. During a yield the GlobalLock cached _result will falsely indicate a locked state. When a GlobalLock is destroyed during a yield, it can unsafely access the storage engine without a lock, causing a segfault. Here is a patch demonstrating this issue. You can apply this diff to master and see that the lock is not actually held in some cases:
This was discovered in a 4.4 crash, BF-28945. We recovered a core dump where one thread is shutting down holding a global lock and destroying WT. The other thread is a GetMore command, running its GlobalLock destructor. The GetMore command segfaults while calling abandonSnapshot accessing the storage engine. It's very likely this failure is a result from this bug. I am unsure of the severity of this bug because due to unfamiliarity with the code. |
| Comments |
| Comment by Josef Ahmad [ 22/Sep/23 ] |
|
I'm closing this ticket as a duplicate of As a general improvement, I've also opened SERVER-81333 to make GlobalLock::isLocked() aware of query yields. |
| Comment by Josef Ahmad [ 21/Sep/23 ] |
|
I'm currently trying to have GlobalLock::isLocked() return the right answer when the lock is yielded. That'll likely involve going through the lock manager. |
| Comment by Josef Ahmad [ 20/Sep/23 ] |
|
I confirm the root cause to be |
| Comment by Matt Boros [ 08/Aug/23 ] |
|
The BF that lead to this discovery was from 4.4, and the patch where I show the invariant in the description failing is on master. |
| Comment by Gregory Noma [ 08/Aug/23 ] |
|
We should figure out which versions this effects |