Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-67988

dropIndexes is not correctly serialized with other DDL operations in case of stepdowns

    • Type: Icon: Bug Bug
    • Resolution: Works as Designed
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • Sharding EMEA
    • ALL

      The bug affects sharded cluster only.

      In SERVER-6491 we made dropIndexes to preserve the shardKey index. In order to achieve that we made dropIndexes command to serialize with shardCollection trough the acquisition of collection DDL lock (local part of the distLock).
      Unfortunately in case of stepdown it could still happen that we drop the skardKey index. Consider the following scenario:

      1. ShardCollection starts and checks that a suitable index exists for the shard key
      2. The primary shard of the database stepdown
      3. A new primary of the primary shard is elected and starts the recovery of the interrupted shardCollection operation
      4. A dropIndexes starts, arrives to the primary shard and manage to acquire the DDL collection lock even if a shardCollection is still ongoing. This is possible since we didn't fully recovered the interrupted shardCollection after stepping up.
      5. Drop indexes find the collection unsharded and drop all the indexes, including the one that would be used as shardKey index.
      6. Drop indexes completes and releases the collection DDL lock.
      7. The recovered shardCollection will finally re-acquired the collection DDL lock and it will shard the collection.

       

      In order to avoid this we must ensure that dropIndexes will wait for the recovery of all the DDL coordiantors before to acquire the collection DDL lock.

            Assignee:
            backlog-server-sharding-emea [DO NOT USE] Backlog - Sharding EMEA
            Reporter:
            tommaso.tocci@mongodb.com Tommaso Tocci
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: