Distributed Locks shouldn't be used on shards because any step down will cause any lock to be held until the timeout is reached even though the operation didn't finish or was interrupted. The config server currently removes all locks on step up, but shards does not have any mechanisms to re-obtain the lock and finish the operation.
As part of the PM-1965 project, we're defining two behaviors for the DDL operations:
- Under FCV we'll follow most of the previous implemented DDL operation code.
- On newer versions we'll ensure the guarantees described on the scope document of the project.
With some commands we have to change the communication order, which means that instead of going through the config server, the command will go to the primary shard of the database directly. On some cases, we'll execute code that was previously implemented on the config server, like for example, holding a distributed lock for a resource. This task consist on providing some mechanism to re-obtain the lock after a step down occurs, or, cleaning up the lock after a step up, providing that a split brain scenario is considered and will not leave the system on an inconsistent state.