[SERVER-59419] OperationContext::_ignoreInterrupts can be written to without synchronization by IgnoreInterruptionsGuard destructor Created: 18/Aug/21 Updated: 29/Oct/23 Resolved: 09/Feb/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 7.0.0-rc0, 5.0.21 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gregory Noma | Assignee: | George Wangensteen |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | sa-remove-fv-backlog-22 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Assigned Teams: |
Service Arch
|
||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||
| Operating System: | ALL | ||||||||
| Backport Requested: |
v6.3, v5.0
|
||||||||
| Sprint: | Service Arch 2023-02-06, Service Arch 2023-02-20 | ||||||||
| Participants: | |||||||||
| Linked BF Score: | 120 | ||||||||
| Description |
|
Interruptible::IgnoreInterruptionsGuard calls Interruptible::popIgnoreInterrupts in its destructor, and Interruptible:: pushIgnoreInterrupts in its constructor, which, in the case of an OperationContext, results in writes to OperationContext::_ignoreInterrupts that are not explicitly synchronized.
The fundamental issue is that OperationContext: ignoreInterrupts can be written to by {pop,push}IgnoreInterrupts without any synchronization by the client-thread via runWithoutInterruption, but it can be read by other threads via getKillStatus. For example, _ignoreInterrupts can be concurrently read in CurOp::reportCurrentOpForClient, resulting in a data race. Since _ignoreInterrupts is a private member of opCtx and running without interruption is done by the Client thread, this should probably be synchronized via the Client lock. Callers of getKillStatus that are not the client thread (such as CurOp::reportCurrentOpForClient) do indeed take the Client lock before reading _ignoreInterrupts already. The general semantics for how the client lock protects these sorts of data races is that while the client-thread is entitled to _read certain opCtx data members lock-free, it still needs to hold the client-lock to write those data members. Conversely, non-client-threads can only read an opCtx's data when holding the client lock, and can never write these fields (the exception is _killCode, which other threads can write to, which has its own atomic synchronization). The reason there is a data race here is because this contract is broken for _ignoreInterrupts, as the client-thread writes to it without holding the client lock. So I think the 'right way' to solve this is probably to make sure that {push/pop}IgnoreInterrupts holds the Client lock and can only be called by the client-thread. |
| Comments |
| Comment by Githook User [ 07/Aug/23 ] |
|
Author: {'name': 'Adityavardhan Agrawal', 'email': 'adi.agrawal@mongodb.com', 'username': 'Adityav369'}Message: (cherry picked from commit 09df7a4b893a2b3d3bcefbf8c49767bac244f99c) |
| Comment by Githook User [ 08/Feb/23 ] |
|
Author: {'name': 'George Wangensteen', 'email': 'george.wangensteen@mongodb.com', 'username': 'gewa24'}Message: |
| Comment by Gregory Noma [ 16/Sep/21 ] |
| Comment by Gregory Noma [ 15/Sep/21 ] |
|
schwerin Yes, CurOp reads this value by calling OperationContext::isKillPending in CurOp::reportCurrentOpForClient, which in turn calls OperationContext::getKillStatus. |
| Comment by Andy Schwerin [ 15/Sep/21 ] |
|
Ah, I see. The problem is OperationContext::getKillStatus accesses _ignoreInterrupts, and may be legally called from any thread, yes? |
| Comment by Andy Schwerin [ 15/Sep/21 ] |
|
For clarification, gregory.noma, what code reads OperationContext::_ignoreInterrupts from other threads? |
| Comment by Louis Williams [ 18/Aug/21 ] |
|
I would loop in Service Architecture to make sure they approve of the solution. I thought we used the Client lock to prevent these types of races. I'm surprised to see that we aren't doing that here. |