Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-58184

Checkpoint thread causes assertions when raced with recovering prepared transactions on startup

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 4.4.6
    • Fix Version/s: 5.0.3, 4.4.9, 5.1.0-rc0
    • Component/s: None
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v5.0, v4.4, v4.2
    • Sprint:
      Repl 2021-07-12, Repl 2021-07-26, Execution Team 2021-08-09, Execution Team 2021-08-23
    • Linked BF Score:
      120

      Description

      The checkpoint thread reads at the stable timestamp to evaluate the amount of oplog necessary for rollback. If a checkpoint is taken during server startup or after a rollback when we are reconstructing prepared transactions, it may be possible to hit an assertion like this in WiredTiger:

      WT_SESSION.prepare_transaction: __txn_assert_after_reads, 516: prepare timestamp (1623940678, 408) must be greater than the latest active read timestamp (1623940808, 102) : Invalid argument
      

      We have only seen this problem reproduce on the code coverage builder, which is extremely slow, and no users have seen this. This has also only been reproduced on 4.4, but it seems like it should affect every version from 4.2 to 5.1.

      A workaround may be to take a global X lock while reconstructing prepared transactions to conflict with the checkpoint thread.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              pavithra.vetriselvan Pavithra Vetriselvan
              Reporter:
              louis.williams Louis Williams
              Participants:
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: