Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-82429

Sharded cluster can process illegal txn statement

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 4.4.0, 5.0.0, 6.0.0, 7.0.0, 7.2.0
    • Component/s: None
    • Labels:
      None
    • Catalog and Routing
    • ALL
    • CAR Team 2023-11-27, CAR Team 2023-12-11, CAR Team 2023-12-25, CAR Team 2024-01-08, CAR Team 2024-01-22, CAR Team 2024-02-05
    • 3

      Suppose we have two collections basic3 and basic4 on a replica set. basic3 has the document {_id: 1} in it already. From within a transaction with txnNumber 0, on session S1, I try to insert a document again with {_id: 1} into basic3. This receives a DuplicateKey error and aborts the transaction.

      Now, if I try to run another statement on basic4 with txnNumber 0 (I shouldn't, but I will, so this is an illegal statement), I get a NoSuchTransaction error, as expected, with message Transaction with { txnNumber: 0 } has been aborted..

      However, let's say instead we're doing this on a sharded cluster with two shards and the two collections live on different shards. Now, it's possible for the illegal statement to get processed instead of receiving a NoSuchTransaction:

      > db.runCommand({
          insert: "basic4",
          documents: [{_id: 2}],
          lsid: { "id" : UUID("e142856b-b106-4e67-a9c6-2d6368431481") },
          txnNumber: NumberLong(0),
          autocommit: false
      });
      {
          "n" : 1,
          "ok" : 1,
          "$clusterTime" : {
              "clusterTime" : Timestamp(1698182943, 43),
              "signature" : {
                  "hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
                  "keyId" : NumberLong(0)
              }
          },
          "operationTime" : Timestamp(1698182943, 43),
          "recoveryToken" : {
              "recoveryShardId" : "shard-rs0"
          }
      }
      

      This diverges from the behavior of a replica set, and in a way "leaks" the transaction.

      But sending an additional statement after we know the txn was aborted is also illegal, so I'm not sure if this leak is a bug or part of undefined behavior. There may also be repercussions if we have people relying on the server to warn them when the txn has been aborted, and they find that in this case, the server doesn't warn them.

      Note that if you actually try to commit the txn it will fail, saying the txn has already been aborted.

      See the comments for more info.

            Assignee:
            backlog-server-catalog-and-routing [DO NOT USE] Backlog - Catalog and Routing
            Reporter:
            vishnu.kaushik@mongodb.com Vishnu Kaushik
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated: