Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53118

Make DistLock resilient to step downs on shards

    • Fully Compatible
    • Sharding 2020-12-14, Sharding 2020-12-28, Sharding 2021-01-11, Sharding 2021-01-25

      Distributed Locks shouldn't be used on shards because any step down will cause any lock to be held until the timeout is reached even though the operation didn't finish or was interrupted. The config server currently removes all locks on step up, but shards does not have any mechanisms to re-obtain the lock and finish the operation.

      As part of the PM-1965 project, we're defining two behaviors for the DDL operations:

      • Under FCV we'll follow most of the previous implemented DDL operation code.
      • On newer versions we'll ensure the guarantees described on the scope document of the project.

      With some commands we have to change the communication order, which means that instead of going through the config server, the command will go to the primary shard of the database directly. On some cases, we'll execute code that was previously implemented on the config server, like for example, holding a distributed lock for a resource. This task consist on providing some mechanism to re-obtain the lock after a step down occurs, or, cleaning up the lock after a step up, providing that a split brain scenario is considered and will not leave the system on an inconsistent state.

            Assignee:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Reporter:
            marcos.grillo@mongodb.com Marcos José Grillo Ramirez
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: