Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-77506

Sharded multi-document transactions can mismatch data and ShardVersion

    • Fully Compatible
    • ALL
    • v7.1, v7.0, v6.0, v5.0, v4.4
    • Show
      repro.patch
    • Sharding EMEA 2023-06-12, Sharding EMEA 2023-06-26, Sharding EMEA 2023-07-10, Sharding EMEA 2023-07-24
    • 135
    • 2

      Issue Status as of April 15 2024

      This issue is included in MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata. The information below describes only the behavior and impact related to SERVER-77506. Please see the consolidated issue page for guidance on identifying if you are impacted by these issues and remediation

      SUMMARY

      Operations within a multi-document transaction may not correctly read or be applied to documents if they occur on ranges of data that undergo a chunk/range migration during the transaction. Reads may return incomplete results, and update or delete operations may not be applied to documents subject to the range/chunk migration.

      ISSUE DESCRIPTION AND IMPACT

      During the course of a multi-document transaction on a Sharded collection, ranges of data that migrated concurrently may not be visible to the transaction nor detected as conflicting writes. This ultimately manifests as operations spanning more than one shard, potentially missing reads and writes.

      This behavior only affects multi-document transactions using a Read Concern of 'local' (default for reads) or 'majority'. Multi-document transactions using read concern of 'snapshot' are not affected.

      This issue affects MongoDB versions:

      • MongoDB 4.4.0 through 4.4.27
      • MongoDB 5.0.0 through 5.0.23
      • MongoDB 6.0.0 through 6.0.12
      • MongoDB 7.0.0 through 7.0.2
      • MongoDB 7.1.0 - 7.1.1

      The minimum conditions for the issue to manifest (all must be met) are:

      • Sharded Cluster with more than one shard;
      • Balancer is enabled;
      • A multi-statement transaction, that:
        • Runs at local (default for reads) or majority read concern, and
        • Performs operations on two or more collections, where the second (or later) collection that receives operations is sharded; and
      • A chunk migration for the second (or later) collection is committed after the first statement of the transaction has started running.

      The issue occurs because, under these conditions, it is incorrectly possible for the recipient of a chunk migration to perform operations on an earlier state of the data. When this happens, those operations match no documents and become no-ops even though the recipient has received the documents and owns their chunk range.

      The table below describes which types of operations may be impacted and how:

      What is affected Effect Downstream Effect
      Reads or Writes outside of a transaction None None
      Within a transaction - Reads or Writes to data outside of a migrating chunk None None
      Within an affected transaction - Writes to data within a migrating chunk Updates or deletes may miss documents that should be targeted.
       
      Inserts may cause aborted transactions due to duplicate key exceptions on unique secondary (non-_id) indexes if their contents depended on missed updates or deletes.
       
      Other inserts succeed and writes on newly inserted documents are correctly applied.
      Application level inconsistencies between documents. No replica set inconsistencies or index inconsistencies.
      Within an affected transaction - Reads from data within a migrated chunk Results may exclude documents in the chunk range. Application-introduced inconsistencies if reads would prompt additional action.

      WORKAROUND

      If your workload utilizes multi-document transactions on a Sharded cluster meeting the criteria above, we recommend that you:

      • Disable the sharded cluster balancer.
      • Upgrade to MongoDB 4.4.28, 5.0.24, 6.0.13 or 7.0.3.
      • Re-enable the balancer.
      • See the Diagnosis & Remediation section below.

      DIAGNOSIS & REMEDIATION

      See MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata for guidance on assessing if you are impacted and the recommended remediation steps.

      Original description

       

      The second and following statements of a multi-document transaction can operate on an already opened storage engine snapshot whose data doesn't match the ShardVersion indicated by the router.

      e.g:

      Consider a sharded multi-document transactions with read concern 'local' or 'majority'. The first statement targets collectionA which exists on shard0. This opens a storage engine snapshot on shard0 at T100. 

      At T100, shard0 owned half the range of collectionB. Later, a chunk migration happens and shard0 becomes owner of the whole range and its ShardVersion is SV2.

      A second statement of the transaction will target collectionB. The router routes according to the post-migration placement (that shard0 owns the whole range) thus it will only target shard0 with SV2.

      The shard will check the SV and see that it matches. However, the storage snapshot at which the transaction is operating does not include the documents for the newly received chunk. So the query will miss documents.

        1. repro.patch
          2 kB
          Jordi Serra Torrens

            Assignee:
            randolph@mongodb.com Randolph Tan
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            48 Start watching this issue

              Created:
              Updated:
              Resolved: