Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-91797

retryable_write_error_labels.js needs better concurrency control for commitTransaction

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 8.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Service Arch
    • Fully Compatible
    • ALL
    • v8.0
    • Programmability 2024-07-08, Programmability 2024-07-22, Programmability 2024-08-05
    • 0

      The test expects that, in a separate thread, it can run an update command against a mongos. For this to work, the main thread must not shutdown the mongos before this thread runs the update.

      The concurrency control designed into this test attempts to use the hangBeforeCommitingTxn failpoint on the shard primary to mediate this race. The idea is that the failpoint is set before the new thread is spawned. Then, the main thread waits for the fail-point to be hit before it shuts down the server. The new thread is meant to hang when it runs commitTransaction after it completes the update. This commit is expected to fail, because it should hang until the mongos is shutdown, and the test asserts that we get a shutdown-error from the commit operation.

      This obviously breaks if any commitTransaction can run on the shard primary after the fail-point is set in the test, but before the update command in the second thread can complete. In this case, the main thread can begin shutting down the server before the update command completes, and our assert that it worked will fail.

      This effects the case "commitTransaction should return mongos shutdown error with RetryableWriteError label"

            Assignee:
            alex.li@mongodb.com Alex Li
            Reporter:
            george.wangensteen@mongodb.com George Wangensteen (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: