Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57545

Stepping down while stepping up with a transaction prepared results in a broken node

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Operating System:
      ALL
    • Backport Requested:
      v5.0, v4.4, v4.2
    • Sprint:
      Repl 2021-06-28, Repl 2021-07-12, Repl 2021-07-26

      Description

      It is possible for a stepdown to start due to some other primary stepping up while we are still holding the RSTL from a step-up attempt. If we do this while we have a transaction prepared, we will uassert when trying to check out a session to restore the prepared transactions locks.

      https://github.com/mongodb/mongo/blob/b9c4dc61d38edd4ae1c4953dbc646fac633d78d0/src/mongo/db/session_catalog_mongod.cpp#L271

      The uassert will cause use to exit signalDrainComplete() without actually signalling that the drain is complete. At that point the oplog applier (and thus replication) will be stuck.

      In addition to fixing this, we should probably mark signalDrainComplete() as "noexcept" so we crash instead of hanging if anything similar happens.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              vesselina.ratcheva Vesselina Ratcheva
              Reporter:
              matthew.russotto Matthew Russotto
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: