[SERVER-70888] ScopedRangeDeleterLock might lead to a deadlock on stepdown Created: 27/Oct/22 Updated: 29/Oct/23 Resolved: 28/Oct/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 6.2.0-rc0 |
| Fix Version/s: | 6.2.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Marcos José Grillo Ramirez | Assignee: | Tommaso Tocci |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||
| Sprint: | Sharding EMEA 2022-10-31 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 153 | ||||||||||||||||||||||||
| Description |
|
SERVER-70094 added code to synchronize the range deletion with stepdowns, specifically, it stores the executor of the range deletion thread so it can be joined when stopping the service. This have an unintended consequence though, if a stepdown command comes in at a time that manages to grab the RSTL lock before the RangeDeleterService thread does, it will get stuck when trying to stop the service (because it is waiting for the range deleter service executor), when at the same time, the range deleter service thread is actually waiting for the RSTL lock. So we have a thread with the RSTL lock held waiting for an executor that will finish only after it grabs the RSTL lock. In order to solve this, besides the executor, we could also capture the operation context and cancel it before waiting for the executor. |
| Comments |
| Comment by Githook User [ 28/Oct/22 ] |
|
Author: {'name': 'Tommaso Tocci', 'email': 'tommaso.tocci@mongodb.com', 'username': 'toto-dev'}Message: |
| Comment by Pierlauro Sciarelli [ 27/Oct/22 ] |
|
Yet another bug that will be solved by |