Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9463

Fix a race opening checkpoint cursors

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Minor - P4 Minor - P4
    • WT11.0.0, 6.1.0-rc0
    • Affects Version/s: None
    • Component/s: None
    • Labels:

      Even after WT-9367 there's still another race that can occur if trying to open a checkpoint cursor while a checkpoint is finishing. The steps are:

      1. <- ds checkpoint completes
      2. read ds metadata ->
      3. read hs metadata ->
      4. <- hs checkpoint completes
      5. <- write stable
      6. <- write oldest
      7. <- write snapshot
      8. read snapshot ->
      9. read stable ->
      10. read oldest ->

      ...which is, I'm afraid, much like the WT-9367 issue except for the history store tree instead of the stable timestamp.

      For those following along at home, the reason this is hard is that there are five things we have to read, all atomically, and there isn't anything we can usefully/correctly lock to get at them all at once... plus any or all of them besides the snapshot might be skipped over and not actually updated by the running checkpoint. The above scenario can't be distinguished from a correct run where the history store checkpoint was skipped without further input.

      I think the solution is to read the snapshot twice (first and last, around everything else) and retry if the checkpoint wall time associated with it isn't the same both times, as well as the current logic that checks if any of the elements are newer than the snapshot. That way, if a concurrently running checkpoint updates some of the items we read but not the snapshot, we'll see they're newer and retry; and if it also updates the snapshot, the two snapshot times won't match. So if there is such a checkpoint (that didn't finish and update the snapshot before we started) it can't update any of the items before we read them without triggering a retry.

      However, I'm not yet completely convinced this is correct; the previous couple versions have also had plausible correctness arguments that have turned out to contain holes.

      Unfortunately, the only way these problems manifest is with rare mismatches in format-mirror, so testing doesn't produce large amounts of confidence either...

            keith.bostic@mongodb.com Keith Bostic (Inactive)
            dholland+wt@sauclovia.org David Holland
            0 Vote for this issue
            4 Start watching this issue