Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59108

Resolve race with transaction operation not killed after step down

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 4.4.11, 4.2.18, 5.0.4, 5.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
    • Fully Compatible
    • ALL
    • v5.0, v4.4, v4.2
    • Repl 2021-09-06, Repl 2021-09-20, Repl 2021-10-04, Repl 2021-10-18
    • 70

      In SERVER-50486, we added a flag on the opCtx of transaction operations to ensure that these operations would be interrupted on step down. We then check to make sure we are still the primary. The commandCanRunHere function will return true if we can accept non-local writes.

      In the stepDown code path, we first acquire the RSTL, which is where we run the killOps thread to kill the opCtx of any commands that have the flag set. Only then do we update if we can accept non-local writes or not. As a result, it seems possible for the following to happen:

      1. In the user thread t1, we add a user command to the _clients vector in ServiceContext. However, we haven't yet hit ExecCommandDatabase::_initiateCommand() and set the flag
      2. In the stepDown thread t2, we attempt to acquire RSTL and loop through all commands. Since the flag is not yet set for the command in t1, it is not killed
      3. In t1, we now set the flag and check if we can still service non-local writes. Since we still can, the command proceeds
      4. In t2, we acquire RSTL and set that we can no longer service non-local writes.

            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            xuerui.fa@mongodb.com Xuerui Fa
            0 Vote for this issue
            9 Start watching this issue