Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-70888

ScopedRangeDeleterLock might lead to a deadlock on stepdown

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 6.2.0-rc0
    • Affects Version/s: 6.2.0-rc0
    • Component/s: Sharding
    • Labels:
      None
    • Fully Compatible
    • ALL
    • Sharding EMEA 2022-10-31
    • 153

      SERVER-70094 added code to synchronize the range deletion with stepdowns, specifically, it stores the executor of the range deletion thread so it can be joined when stopping the service.

      This have an unintended consequence though, if a stepdown command comes in at a time that manages to grab the RSTL lock before the RangeDeleterService thread does, it will get stuck when trying to stop the service (because it is waiting for the range deleter service executor), when at the same time, the range deleter service thread is actually waiting for the RSTL lock.

      So we have a thread with the RSTL lock held waiting for an executor that will finish only after it grabs the RSTL lock.

      In order to solve this, besides the executor, we could also capture the operation context and cancel it before waiting for the executor.

            Assignee:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved:

                Error rendering 'slack.nextup.jira:slack-integration-plus'. Please contact your Jira administrators.