[SERVER-48617] Concurrent 'createIndexes' can block all write operations on a primary and standalone by exhausting write tickets. Created: 05/Jun/20 Updated: 29/Oct/23 Resolved: 10/Jul/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0, 4.4.2 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Suganthi Mani | Assignee: | Benety Goh |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||
| Sprint: | Execution Team 2020-06-29, Execution Team 2020-07-13 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 6 | ||||||||||||||||||||
| Description |
|
We can get into below 2 way deadlock which will block all write operations on a primary and standalone. 2) IndexBuildsCoordinator worker threads are waiting for write tickets (For e.g., when restoring yielded locks during collection scan phase) for index completion. |
| Comments |
| Comment by Githook User [ 08/Sep/20 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: (cherry picked from commit 7afcf9895eb7da296f602604e2973cdb6fa0c67f) |
| Comment by Githook User [ 09/Jul/20 ] |
|
Author: {'name': 'Benety Goh', 'email': 'benety@mongodb.com', 'username': 'benety'}Message: |
| Comment by Suganthi Mani [ 08/Jul/20 ] |
|
Just want to record some offline conversation we had it in the slack. |
| Comment by Benety Goh [ 08/Jul/20 ] |
|
As follow-up work, we would like to look into detecting and preventing other instances of the resource locking/condition variable anti-pattern in |
| Comment by Benety Goh [ 08/Jul/20 ] |
|
The code that waits on the conditional variable while holding the global lock was introduced in |
| Comment by Suganthi Mani [ 11/Jun/20 ] |
You are right, extra waiting is not critical unless if we don't hold any mongoDB locks /or/ storage resources . If we go by this approach, I would expect to have some invariant check to verify opCtx is not holding any locks (or just more specific RSTL lock) before CV wait. This is to make sure we don't break the contract in future and get into deadlock scenarios. But, my design approach would be stepdown should interrupt all write operations and make the cmd response available as quick as possible to the user. And, I consider createIndexes as a write operation thread. Though I agree many concurrent index creation in real world is rare. Anyways, I am ok with with either approach. |
| Comment by Louis Williams [ 11/Jun/20 ] |
|
suganthi.mani, my question was not directed at just the Global lock. Why do we need to be holding the RSTL, or any lock, while waiting on a condition variable? edited based on your update: Yes, we should not be holding locks while waiting on the condition. I don't think the extra waiting is very important because it won't block while holding onto locks or storage resources. I think this is a rare enough case that the extra waiting is not critical. |
| Comment by Suganthi Mani [ 11/Jun/20 ] |
|
louis.williams If the thread is waiting on CV just by holding RSTL lock, then it would block step down which requires RSTL in X mode. Stepdown only interrupts/kills thread holding global lock in IX, X or S mode. Currently we support only 2 modes for RSTL. All read/write operations acquire RSTL in IX mode and repl state transition threads acquire RSTL in X mode. Stepdown needs to kill only only write operation and not read operations. And, that information can be obtained only via global locks. The other option is not to hold any locks while waiting for this conditional wait. Only, downside is that the node might keep waiting on CV even after the node transitioned to secondary state. But, if you are going with this approach, we should be careful about the locking order when checking the repl state, it should be MongoDB locks (global or RSTL lock) followed by the indexBuildsCoordinator mutex lock. |
| Comment by Louis Williams [ 11/Jun/20 ] |
|
It seems like the problem is that we are holding a Global IX lock while blocking on a condition variable, which is an antipattern that leads to deadlocks. suganthi.mani you said:
Why does the operation need to be immediately interrupted by stepdown? If it's not holding a global lock, then it's not holding up any resources and doesn't need to be interrupted immediately. When a threadpool resource is eventually freed, the operation will discover that it is no longer primary and just return an error. |
| Comment by Suganthi Mani [ 05/Jun/20 ] |
|
Another important point is that, it seems when we set skip acquiring ticket on opCtx we can't reset the '_shouldAcquireTicket'. CreateIndexes user thread sometimes have to abort the index build which will need it to write the abortIndexBuild oplog entry. Though, writing oplog is a WT write operation, we can skip the ticketing mechanism by applying this reasoning that we used in transaction code path. |
| Comment by Suganthi Mani [ 05/Jun/20 ] |
|
Easiest solution is that we can make 'createIndexes' user threads to skip acquiring write ticket when acquiring global lock by using skipAcquireTicket(). The goal of ticketing mechanism is to throttle number of concurrent WT write/read operations. 'CreateIndexes' user (parent) threads are not going make any WT write operations, only indexBuildsCoordinator worker threads are going to build index and perform WT index table write operations. So, it's ok for 'createIndexes' cmd user threads to skip acquiring ticket when acquiring global lock at this point. To be noted, we acquire global lock in IX instead of directly acquiring RSTL lock in IX mode before waiting on this cv is because we want the conditional variable wait to be interrupted on step down. Stepdown thread will interrupt only threads holding global lock in IX, X and S mode. It also seems collection scanning phase and drain phase acquire write tickets because we acquire DB lock in IX mode. So, do we really need DB and collection lock in IX mode? can we make it as IS mode? To be noted, this won't fix the deadlock issue, just nit. Seems historically, collection scan phase & drain phase used to acquire DB lock in IX mode and collection in IS or S mode. |