ScopedRangeDeleterLock might lead to a deadlock on stepdown

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Fixed
    • Priority: Major - P3
    • 6.2.0-rc0
    • Affects Version/s: 6.2.0-rc0
    • Component/s: Sharding
    • None
    • Fully Compatible
    • ALL
    • Sharding EMEA 2022-10-31
    • 153
    • None
    • 3
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      SERVER-70094 added code to synchronize the range deletion with stepdowns, specifically, it stores the executor of the range deletion thread so it can be joined when stopping the service.

      This have an unintended consequence though, if a stepdown command comes in at a time that manages to grab the RSTL lock before the RangeDeleterService thread does, it will get stuck when trying to stop the service (because it is waiting for the range deleter service executor), when at the same time, the range deleter service thread is actually waiting for the RSTL lock.

      So we have a thread with the RSTL lock held waiting for an executor that will finish only after it grabs the RSTL lock.

      In order to solve this, besides the executor, we could also capture the operation context and cancel it before waiting for the executor.

        1. BFG-1553238-stacktrace.log
          664 kB
          Marcos José Grillo Ramirez

              Assignee:
              Tommaso Tocci
              Reporter:
              Marcos José Grillo Ramirez
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: