Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65930

DDL coordinators and rename participant initial checkpoint may incur in DuplicateKey error

    XMLWordPrintable

Details

    • Fully Compatible
    • ALL
    • v6.0, v5.3, v5.0
    • Sharding EMEA 2022-05-16, Sharding EMEA 2022-05-30

    Description

      Each DDL coordinator is calling _inserStateDocument to initially checkpoint the received operation on disk. Since the write is using a concern with timeout , it could happen the following:

      1. DDL coordinator starts and calls _insertStateDocument
      2. The document is locally written but not yet majority committed
      3. The write concern timeout is hit
      4. The coordinator retries
      5. The retry fails because the document had already been inserted so a DuplicateKey error is thrown

      In some cases, such as for renameCollection, the result is that the DDL coordinator document remains on disk but the in-memory instance is released because of the exception. When this happens, the only way to resume the coordinator is either having the user invoke again the operation, either having a new node stepping on the source database's primary shard.

      [EDIT] Also the rename participant can incur in the same problem since the implemented logic is the same.

      Attachments

        Issue Links

          Activity

            People

              allison.easton@mongodb.com Allison Easton
              pierlauro.sciarelli@mongodb.com Pierlauro Sciarelli
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: