Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-84723

Sharded multi-document transactions can observe partial effects of concurrent DDL operations

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Critical - P2 Critical - P2
    • 7.0.6, 8.0.0-rc0, 7.2.2, 7.3.0-rc4
    • Affects Version/s: 7.0.0
    • Component/s: None
    • Labels:
      None
    • Catalog and Routing
    • Fully Compatible
    • ALL
    • v7.3, v7.2, v7.0
    • CAR Team 2024-01-22, CAR Team 2024-02-05

      Issue Status as of April 15 2024

      This issue is included in MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata. The information below describes only the behavior and impact related to SERVER-84723. Please start with the consolidated issue page for guidance on identifying if these issues impact you.

      SUMMARY

      Operations within a multi-document transaction may return incomplete data and may not apply update or delete operations to documents if they occur on ranges of data affected by a concurrent sharding metadata change.

      ISSUE DESCRIPTION AND IMPACT

      A multi-document transaction may see partial effects of a Data Definition Layer (DDL) operation running concurrently on the involved collections. This can manifest as the transaction seeing only part of the involved collection, or observing a mix of the collection's state before and after modifying sharding metadata. This will cause reads within the transaction on sharded collections to miss data or return a mix of data from different metadata versions. In turn, writes within the transaction on sharded collections can possibly miss updating/deleting documents.

      This issue affects MongoDB versions 7.0.0 through 7.0.5.

      The minimum conditions for the issue to manifest (all must be met) are:

      • Sharded cluster with more than one shard
      • An application which uses or used:
        • A multi-statement transaction that:
          • Runs at local, majority, or snapshot read concern.
          • Involves more than one collection, at least one of which is sharded.
          • Involves more than one shard
        • Queryable Encryption
          • Where one or more sharded collections have encrypted fields
      • Concurrent DDL operations:
        • renameCollection()
        • drop()
        • reshardCollection()

      The table below describes which types of operations may be impacted and how:

      What is affected Effect Downstream Effect
      Reads or Writes outside of a transaction None None
      Within a transaction - Reads or Writes to a collection not undergoing drop or rename None None
      Within a transaction - Reads or Writes to an unsharded collection. None None
      Within a transaction using snapshot read concern - Writes to a sharded collection undergoing a drop or rename None None
      Within a transaction using local or majority read concern - Writes to sharded collections undergoing drop or rename Updates or deletes may miss documents which should be targeted.
       
      Inserts will raise a WriteConflict, or will be applied correctly, depending on the exact interleaving and targeted shard.
       
      Writes on newly inserted documents will be correctly applied.
      Application level inconsistencies between documents. No replica set inconsistencies or index inconsistencies.
      Within a transaction using any read concern - Reads from sharded collections undergoing drop or rename Possible incomplete results. Application-introduced inconsistencies if reads would prompt additional action.

      WORKAROUND

      If your workload utilizes multi-document transactions on a Sharded cluster meeting the criteria above, we recommend that you:

      • Upgrade to MongoDB 7.0.6 or later
      • See the Diagnosis & Remediation section below

      DIAGNOSIS & REMEDIATION

      See MongoDB System Alert: Sharded multi-document transactions may perform operations using inconsistent sharding metadata for guidance on assessing if you are impacted and the recommended remediation steps.

      Original description

      Consider the following interleaving (repro1.js):
      1. Initial state:

      • collA: sharded collection with chunks both on shard0 and shard1.
      • collB: unsharded collection on shard0.
      • collC: does not exist.

      2. Start txn with local or majority read concern, hit shard0 to read collB [shard0's txn snapshot has: ns1 and ns2]
      3. Rename collA -> collC.
      4. Read collC. On shard0, collC does not exist in the txn snapshot. On shard1 it will. Therefore the txn will see half the collection.

      Moreover, if collectionC existed initially, the transaction would observe a mix of the original collection and the post-rename collection.
      The example above involves rename, but a similar situation might be possible with reshardCollection.

      Another anomaly is (repro2.js):
      1. Initial state

      • shard0 (dbPrimary): collA(sharded) and collB(unsharded)
      • shard1: collA(sharded)

      2. Start txn (local, majority or snapshot read concern), hit shard0 for collB
      3. Drop collA
      4. Read collA. Will target shard0, will read the sharded coll (but just half of it).

        1. repro.js
          3 kB
        2. repro2.js
          3 kB

            Assignee:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Reporter:
            jordi.serra-torrens@mongodb.com Jordi Serra Torrens
            Votes:
            0 Vote for this issue
            Watchers:
            26 Start watching this issue

              Created:
              Updated:
              Resolved: