Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53118

Make DistLock resilient to step downs on shards

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.9.0
    • Component/s: Sharding
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25

      Description

      Distributed Locks shouldn't be used on shards because any step down will cause any lock to be held until the timeout is reached even though the operation didn't finish or was interrupted. The config server currently removes all locks on step up, but shards does not have any mechanisms to re-obtain the lock and finish the operation.

      As part of the PM-1965 project, we're defining two behaviors for the DDL operations:

      • Under FCV we'll follow most of the previous implemented DDL operation code.
      • On newer versions we'll ensure the guarantees described on the scope document of the project.

      With some commands we have to change the communication order, which means that instead of going through the config server, the command will go to the primary shard of the database directly. On some cases, we'll execute code that was previously implemented on the config server, like for example, holding a distributed lock for a resource. This task consist on providing some mechanism to re-obtain the lock after a step down occurs, or, cleaning up the lock after a step up, providing that a split brain scenario is considered and will not leave the system on an inconsistent state.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kaloian.manassiev Kaloian Manassiev
              Reporter:
              marcos.grillo Marcos José Grillo Ramirez
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: