Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-57545

Stepping down while stepping up with a transaction prepared results in a broken node

    • Type: Icon: Bug Bug
    • Resolution: Duplicate
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • None
    • ALL
    • v5.0, v4.4, v4.2
    • Repl 2021-06-28, Repl 2021-07-12, Repl 2021-07-26

      It is possible for a stepdown to start due to some other primary stepping up while we are still holding the RSTL from a step-up attempt. If we do this while we have a transaction prepared, we will uassert when trying to check out a session to restore the prepared transactions locks.

      https://github.com/mongodb/mongo/blob/b9c4dc61d38edd4ae1c4953dbc646fac633d78d0/src/mongo/db/session_catalog_mongod.cpp#L271

      The uassert will cause use to exit signalDrainComplete() without actually signalling that the drain is complete. At that point the oplog applier (and thus replication) will be stuck.

      In addition to fixing this, we should probably mark signalDrainComplete() as "noexcept" so we crash instead of hanging if anything similar happens.

        1. repro.SERVER-54545
          10 kB
          Matthew Russotto

            Assignee:
            vesselina.ratcheva@mongodb.com Vesselina Ratcheva (Inactive)
            Reporter:
            matthew.russotto@mongodb.com Matthew Russotto
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: