Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-76463

Ensure Sharding DDL locks acquired outside a coordinator wait for DDL recovery

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 7.1.0-rc0, 7.0.6
    • Affects Version/s: 7.0.0-rc10, 6.0.10, 5.0.21
    • Component/s: None
    • Labels:
    • Sharding EMEA
    • Fully Compatible
    • v7.0
    • Sharding EMEA 2023-05-15, Sharding EMEA 2023-05-29, Sharding EMEA 2023-06-12
    • 146

      DDL locks acquired outside ShardingDDLCoordinator infrastructure does not properly synchronize with other DDL operations in case of failovers/stepdowns.

      Recovery of DDL locks for Sharding DDL coordinator works in the following way:

      1. Some sharding DDL coordinators starts and acquire their respective DDL locks
      2. The primary shard of the database stepdown
      3. A new primary of the primary shard is elected and starts the recovery of the interrupted Sharding DDL coordinators
      4. The ShardingDDLCoordinator service enters into RECOVERY state
      5. All attempt to create new coordinators will wait until the service complete the recovery
      6. Once all the coordinators have been recovered and reacquired their DDL locks, the ShardingDDLCoordinator service move to RECOVERED state.
      7. Creation of new coordinators is unblocked.

      DDL locks acquired outside the ShardingDDLCoordinator infrastructure does not wait for the recovery of DDL locks acquired before the stepdown.

      We should ensure that DDL locks can be acquired only after all DDL locks acquired from the previous primary node have been recovered.

            silvia.surroca@mongodb.com Silvia Surroca
            tommaso.tocci@mongodb.com Tommaso Tocci
            0 Vote for this issue
            4 Start watching this issue