Demonstrate that cache pressure doesn't prevent stepdown from running

XMLWordPrintableJSON

    • Storage Engines, Storage Engines - Server Integration
    • SESI - 2025-08-05
    • None
    • 3
    • TBD
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      During a replica set stepdown, all non-lock-free user operations are interrupted. This is because the stepdown process requires the Replica Set Transition Lock (RSTL) in exclusive mode, whereas non-lock-free user operations acquire it in intent mode. There are exceptions for certain internal operations, such as the compact command, which bypass the RSTL. These operations do not require knowledge of the replica set state and do not replicate data, as they perform only local changes.

      While the stepdown is in progress, the node is effectively offline. This means that no read (except lock-free reads) or write operations, nor oplog application, can occur. The exclusive lock for the RSTL is prioritized at the front of the lock queue, and all intent locks are queued behind it.

      There have been instances of failures in production attributed to user operations involved in cache eviction, which are unable to be interrupted promptly. This will result in stepdown taking a long time.

            Assignee:
            Unassigned
            Reporter:
            Gregory Wlodarek
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated: