[SERVER-67698] Investigate potential deadlocks in LockState Created: 30/Jun/22 Updated: 01/Aug/22 Resolved: 29/Jul/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Jordi Olivares Provencio | Assignee: | Geert Bosch |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
It seems that Resource Mutexes are never released during a yield, this has the potential to lead to deadlocks as in the following case we would have one: Thread A takes Lock A -> Lock B -> Resource Mutex A Thread A yields Thread B takes Lock B -> Waits until Resource Mutex A is available Thread A wakes -> Lock A -> Wait until Lock B is available
Additionally it seems that the order of the locks taken is broken when storing in the LockSnapshot here. This is also unsafe, as it can cause deadlocks too as the order of locks when taken is potentially not the same after a yield and reacquiring them. |
| Comments |
| Comment by Allison Easton [ 01/Aug/22 ] |
|
Sorry about that. No, it was for this ticket. During FCCV upgrade to 6.0, we are setting the orphan counts on the range deletion documents. To do this, we are using an index scan with yield policy YIELD_AUTO. I left the comment here because this ticket came out of an investigation into whether we could be yielding during that scan, since it could result in incorrect orphan counts. It turned out not to be a problem, but that is because the lock we are using for that code takes an X resource lock which isn't yielded since resource mutexes aren't currently yielded. I just wanted to make sure that if we started yielding mutex locks, this code didn't start yielding and cause inaccurate orphan counts. |
| Comment by Connie Chen [ 29/Jul/22 ] |
|
allison.easton@mongodb.com - let us know if you meant for the above comment to be on another ticket?
|
| Comment by Allison Easton [ 30/Jun/22 ] |
|
Just adding a comment here to note that upgrade to 6.0 relies on this for orphan counts to be correct, so if this gets changed on 6.0 we should make sure this still works correctly. |