Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-89529

Retryable writes during resharding may execute more than once if chunk migration follows the reshard operation

    • Cluster Scalability
    • Fully Compatible
    • ALL
    • v8.0, v7.3, v7.0, v6.0, v5.0
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Issue Status as of March 3, 2025

      ISSUE DESCRIPTION AND IMPACT
      Retryable writes occurring during resharding may be applied more than once if a specific chunk migration follows resharding’s commit. This may manifest as inconsistencies from the perspective of a client application.

      This issue can only manifest if ALL of the following are true:

      • A retryable write is performed while a reshardCollection operation is in progress.
      • A chunk containing documents affected by the retryable write is migrated to a shard which was not the owner of those documents under the original shard key.
        • Note that this means a cluster must contain 3 or more shards to be affected.
      • The retryable write must actually be retried (for example, due to a network error) after resharding and the subsequent chunk migration commit.

      If the issue occurs and a retryable write is performed again, the application-facing impact could take one of the following forms:

      Sequence of Operations Expected Outcome Actual Outcome
      Retryable insert {_id: 5}
      Retryable delete {_id: 5}
      Duplicate packet causes retry of insert
      No document with _id: 5 is present Document {_id:5} is present
      Document {_id: 5, counter: 0} is present
      Retryable update with query {_id: 5} and update of {$inc: {counter: 1}

      }
      Duplicate packet causes retry of update

      {_id: 5, counter: 1} {_id: 5, counter: 2}
      Retryable delete {_id: 5}
      Retryable insert {_id: 5}
      Duplicate packet causes retry of delete
      Document {_id: 5} is present No document with _id: 5 is present

      DIAGNOSIS AND REMEDIATION
      A script is available at https://github.com/mongodb/support-tools/tree/master/ca-118 that can be used to potentially rule out impact by this issue. This script will inspect the config.changelog to confirm that no chunk migrations occurred on a namespace that has been resharded for a 30 minute period following resharding. Note the following limitations of the script:

      • The config.changelog is a capped collection and therefore only has a limited amount of history. The script is unable to provide any insight beyond the limit of the changelog’s history.
      • The script can definitively determine that a cluster was not affected (during the history present in config.changelog, see the previous point).
      • The script can not definitively determine that a cluster was affected.

      If the script reports that a cluster may have been affected or if the script is inconclusive, then you will need to inspect your data to ensure that it is consistent from the perspective of your application. The issue, if it occurs, will only affect retryable writes that operated on the resharded collection. With this in mind, pay specific attention to operations on the collection in question, particularly during the period immediately following resharding’s completion.

      If your application meets the criteria above, we recommend that you upgrade to one of the following versions:

      Affected Versions Recommended Upgrade Versions
      5.0.0 - 5.0.30 5.0.31+
      6.0.0 - 6.0.16 6.0.20+
      7.0.0 - 7.0.12 7.0.16+
      8.0.0 - 8.0.3 8.0.5+

      WORKAROUNDS
      On affected versions, disabling the balancer for a period of time following resharding will also prevent the issue from occurring. The balancer should remain disabled for a length of time sufficient to ensure retryable writes from during the resharding operation will no longer be retried. The recommended minimum duration to disable the balancer following a resharding operation is 30 minutes, matching the default timeout for logical sessions.

      Original description

      Resharding preserves the full retryability history for any retryable writes which occur during the resharding operation. If a chunk migration follows the resharding, session migration should transfer the relevant write history over to the recipient of the chunk. The way chunk migration checks for whether an oplog is relevant is by filtering on the namespace being migrated

      The problem is that when resharding recipients update their config.transactions table (based on the retryable writes/transactions performed on the donor shard), it creates a noop oplog entry with the namespace set to empty. If the resharding recipient then becomes the donor in the following chunk migration, due to the empty namespace, it will incorrectly conclude that this oplog entry isn't relevant to the chunk actively being migrated. As a result, the noop oplog entry for the already executed retryable write never gets migrated and the retryable write could be executed again after the chunk migration commits. 

       

      Adding Max's repro for this issue:

      1. Start a resharding operation
      2. Run a retryable $inc update during the resharding operation
      3. Resharding operation completes
      4. Run chunk migration
      5. Retry retryable write from (2) and verify no new oplog entry was generated

            Assignee:
            ben.gawel@mongodb.com Ben Gawel
            Reporter:
            kruti.shah@mongodb.com Kruti Shah
            Votes:
            0 Vote for this issue
            Watchers:
            17 Start watching this issue

              Created:
              Updated:
              Resolved:
              None
              None
              None
              None