Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-75288

Investigate whether the stepdown killop thread should kill operations that hold the RSTL

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Replication
    • Repl 2023-04-17, Repl 2023-05-01, Repl 2023-05-15
    • 135

      Right now the killop thread currently kills operations that took the global lock in a mode conflicting with writes. We did not kill operations that held the RSTL, because at the time we added the kill op thread, reads held the RSTL (this is safe because long running reads would periodically yield). This gave a better user experience because otherwise readers would have to handle interruption during failovers.

      After lock free reads, many reads no longer take the RSTL. So, we should be able to start killing operations that take the RSTL on stepdown.

      This has the benefit of preventing future deadlocks in situations where threads take the global lock in IS mode while implicitly also taking the RSTL, but are blocked waiting on a DB S mode lock that conflicts with a prepared transaction. The prepared transaction would be blocked from committing if the node was trying to stepdown, but couldn't acquire the RSTL due to the reader thread already holding the RSTL.

      This work also might fix deadlocks of this nature that are already possible that we haven't noticed yet. However, I'm not yet sure what complications/side effects making this change would introduce.

            backlog-server-repl [DO NOT USE] Backlog - Replication Team
            samy.lanka@mongodb.com Samyukta Lanka
            1 Vote for this issue
            11 Start watching this issue