Empty transactions rolling back due to conflict, if committed, may succeed or crash

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: Transactions
    • None
    • Storage Engines, Storage Engines - Transactions
    • SE Transactions - 2025-09-12
    • 5

      This issue came up during MongoDB Research work where we generated randomized multithreaded workloads for WiredTiger. This issue covers both an unusual single threaded behavior, as well as a related assertion failure we found was only possible under multithreaded execution.

      If 2 transactions conflict over a write, and the "losing" transaction had no successful writes, it is able to commit successfully rather than rollback.

      Using an extension of Peter Macko's workload syntax that shows result codes, the scenario looks like this:

      create_table(1, "table1", "Q", "Q") --> 0
      begin_transaction(1) --> 0
      begin_transaction(2) --> 0
      insert(1, 1, 111, 111) --> 0
      insert(1, 2, 111, 222) --> -31800
      commit_transaction(1, 20) --> 0
      commit_transaction(2, 10) --> 0

      The last line returning 0 is strange, since semantically the transaction failed, but it is letting us commit instead. The commit consists of 0 updates, granted, so no damage is done, but it is strange.

      In a multithreaded context, it is possible to see this outcome as well (assuming each transaction is performed by one of 2 threads under minimal synchronization):

      create_table(1, "table1", "Q", "Q") --> 0
      begin_transaction(1) --> 0
      begin_transaction(2) --> 0
      insert(1, 1, 111, 111) --> 0
      insert(1, 2, 111, 222) --> 0
      commit_transaction(1, 20) --> 0
      commit_transaction(2, 10) --> !crash! 

      In this case, 2 transactions compete to access the same key and appear to both succeed. On commit, the one with the lower timestamp crashes with an assertion failure because the other transaction's update is present (and has a higher timestamp).

      Unfortunately we can only reproduce this failure in Antithesis (normal randomized workloads do not find it), and it seems to be only possible due to a non-linearizable concurrent mix of the 2 transactions.

              Assignee:
              [DO NOT USE] Backlog - Storage Engines Team
              Reporter:
              Finn Hackett (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: