[SERVER-76463] Ensure Sharding DDL locks acquired outside a coordinator wait for DDL recovery Created: 24/Apr/23  Updated: 22/Nov/23  Resolved: 31/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 7.0.0-rc10, 6.0.10, 5.0.21
Fix Version/s: 7.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Tommaso Tocci Assignee: Silvia Surroca
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Problem/Incident
causes SERVER-79285 makeOperationContext should not be ca... Closed
causes SERVER-81534 DDL locks musn't be acquired during s... Closed
Related
is related to SERVER-67988 dropIndexes is not correctly serializ... Closed
Assigned Teams:
Sharding EMEA
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0
Sprint: Sharding EMEA 2023-05-15, Sharding EMEA 2023-05-29, Sharding EMEA 2023-06-12
Participants:
Linked BF Score: 146

 Description   

DDL locks acquired outside ShardingDDLCoordinator infrastructure does not properly synchronize with other DDL operations in case of failovers/stepdowns.

Recovery of DDL locks for Sharding DDL coordinator works in the following way:

  1. Some sharding DDL coordinators starts and acquire their respective DDL locks
  2. The primary shard of the database stepdown
  3. A new primary of the primary shard is elected and starts the recovery of the interrupted Sharding DDL coordinators
  4. The ShardingDDLCoordinator service enters into RECOVERY state
  5. All attempt to create new coordinators will wait until the service complete the recovery
  6. Once all the coordinators have been recovered and reacquired their DDL locks, the ShardingDDLCoordinator service move to RECOVERED state.
  7. Creation of new coordinators is unblocked.

DDL locks acquired outside the ShardingDDLCoordinator infrastructure does not wait for the recovery of DDL locks acquired before the stepdown.

We should ensure that DDL locks can be acquired only after all DDL locks acquired from the previous primary node have been recovered.



 Comments   
Comment by Gil Alon [ 26/Sep/23 ]

Requesting a backport for this ticket, since SERVER-76626 which originally caused this ticket to be filed was just backported to 7.0. This change will be backported by BACKPORT-16706.

Comment by Githook User [ 30/May/23 ]

Author:

{'name': 'Silvia Surroca', 'email': 'silvia.surroca@mongodb.com', 'username': 'silviasuhu'}

Message: SERVER-76463 Ensure Sharding DDL locks acquired outside a coordinator wait for DDL recovery
Branch: master
https://github.com/mongodb/mongo/commit/4954cb44e1ceeda02db1902ffd7ea89d08c7b04c

Generated at Thu Feb 08 06:32:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.