Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-5530

Reconciliation shouldn't reset WT_UPDATE types within an update chain.

    • Type: Icon: Technical Debt Technical Debt
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 5
    • Storage Engines 2020-02-24

      The rec_append_orig_value() function resets the WT_UPDATE.type field of an existing, linked-in update structure from WT_UPDATE_BIRTHMARK to WT_UPDATE_STANDARD.

      This is dangerous, because code that does something like this (from wt_txn_read()):

          830     *updp = NULL;
          831     for (skipped_birthmark = false; upd != NULL; upd = upd->next) {
          832         /* Skip reserved place-holders, they're never visible. */
          833         if (upd->type != WT_UPDATE_RESERVE) {
          834             upd_visible = __wt_txn_upd_visible_type(session, upd);
          835             if (upd_visible == WT_VISIBLE_TRUE)
          836                 break;
          837             if (upd_visible == WT_VISIBLE_PREPARE)
          838                 return (WT_PREPARE_CONFLICT);
          839         }
          840         /* An invisible birthmark is equivalent to a tombstone. */
          841         if (upd->type == WT_UPDATE_BIRTHMARK)
          842             skipped_birthmark = true;
          843     }
          844
          845     if (upd == NULL && skipped_birthmark)
          846         upd = &tombstone;
          847
          848     *updp = upd == NULL || upd->type == WT_UPDATE_BIRTHMARK ? NULL : upd;
      

      Can get into trouble because if upd.type is re-read from memory at line 848, it can race with reconciliation.

      See PR #5173 for more discussion, specifically from alexander.gorrod:

      The reason we [reset the update type from WT_UPDATE_BIRTHMARK to WT_UPDATE_STANDARD] that is that we had a lot of trouble ensuring that we were handling tombstones correctly with the lookaside file. I got to the point of enforcing that a page being written to lookaside doesn't have any tombstones left. That decision has had fallout (probably including this), on the other hand, it means we can be confident that the lookaside file doesn't end up with two tombstones for a key - which is bad as it generally means that one of them is pointing at a value that is the wrong version.

            Assignee:
            sulabh.mahajan@mongodb.com Sulabh Mahajan
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: