Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59108

Resolve race with transaction operation not killed after step down

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • None
    • 4.4.11, 4.2.18, 5.0.4, 5.1.0-rc0
    • None
    • None
    • Fully Compatible
    • ALL
    • v5.0, v4.4, v4.2
    • Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
    • 70

    Description

      In SERVER-50486, we added a flag on the opCtx of transaction operations to ensure that these operations would be interrupted on step down. We then check to make sure we are still the primary. The commandCanRunHere function will return true if we can accept non-local writes.

      In the stepDown code path, we first acquire the RSTL, which is where we run the killOps thread to kill the opCtx of any commands that have the flag set. Only then do we update if we can accept non-local writes or not. As a result, it seems possible for the following to happen:

      1. In the user thread t1, we add a user command to the _clients vector in ServiceContext. However, we haven't yet hit ExecCommandDatabase::_initiateCommand() and set the flag
      2. In the stepDown thread t2, we attempt to acquire RSTL and loop through all commands. Since the flag is not yet set for the command in t1, it is not killed
      3. In t1, we now set the flag and check if we can still service non-local writes. Since we still can, the command proceeds
      4. In t2, we acquire RSTL and set that we can no longer service non-local writes.

      Attachments

        Issue Links

          Activity

            People

              vesselina.ratcheva@mongodb.com Vesselina Ratcheva
              xuerui.fa@mongodb.com Xuerui Fa
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: