Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65930

DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30

      Each DDL coordinator is calling _inserStateDocument to initially checkpoint the received operation on disk. Since the write is using a concern with timeout , it could happen the following:

      1. DDL coordinator starts and calls _insertStateDocument
      2. The document is locally written but not yet majority committed
      3. The write concern timeout is hit
      4. The coordinator retries
      5. The retry fails because the document had already been inserted so a DuplicateKey error is thrown

      In some cases, such as for renameCollection, the result is that the DDL coordinator document remains on disk but the in-memory instance is released because of the exception. When this happens, the only way to resume the coordinator is either having the user invoke again the operation, either having a new node stepping on the source database's primary shard.

      [EDIT] Also the rename participant can incur in the same problem since the implemented logic is the same.

            allison.easton@mongodb.com Allison Easton
            pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
            0 Vote for this issue
            6 Start watching this issue