Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-40487

Stop running the RstlKillOpthread when a node is no longer primary

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Replication
    • Labels:
    • Replication
    • Repl 2019-04-22, Repl 2019-05-06, Repl 2019-05-20

      Currently, when 2 concurrent step downs are triggered (can be a combination of conditional step down and unconditional step down or 2 conditional step downs), there is a possibility that the step down thread can kill the transaction operations processed by the second oplog application.

      Consider the below scenario and assume that node A is in primary state.
      1) User executes replSetStepDown cmd (Thread X).
      2) Thread X is at this line.
      3) Now, node A notices that a new term has begun via heartbeat. So, node A  steps down via unconditional stepdown code path.
      4) Now the state of node A will be SECONDARY.
      5) Node A's oplog application tries to apply the prepare/commit oplog entry. This would require the secondary oplog application to checkout the session. Let assume, oplog application thread Y, tries to apply commit oplog entry and is at this line.
      6) Read operations comes in (Thread Z), acquired the RSTL lock in mode IX  and global lock in IS mode. And, its blocked by  thread Y due to prepare conflict ( conflict at the document lock).
      7) Thread X resumes and enqueues the RSTL lock in X mode as it is blocked by read thread (thread Z).
      8) Thread X starts "RstlKillOpthread". Now, RstlKillOpthread marks the thread Y(belongs to secondary oplog application) as killed as part of killSessionsAbortUnpreparedTransactions.

            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            suganthi.mani@mongodb.com Suganthi Mani
            0 Vote for this issue
            5 Start watching this issue