Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-4868

Aggregate btree write gen from leaf pages in salvage

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • WT10.0.0, 4.4.0-rc0, 4.7.0
    • Affects Version/s: None
    • Component/s: None
    • Labels:
      None
    • 5
    • Storage Engines 2020-03-09

      Salvage and import have a problem with unstable entries.

      We avoid reading unstable entries from previous wiredtiger_open connections by comparing write-generations on pages to the starting write generation of the database connection. In summary, if we read a page with a write generation older than the starting write generation of the database connection, unstable entries on the page are ignored.

      If a file is imported from a database and has leaf pages with unstable entries on pages with a write generation newer than the import database's starting write generation, we'll potentially read the values (or get very confused).

      If a file is salvaged, we have a similar problem. Salvage doesn't rewrite leaf pages so unstable entries remain on the pages after salvage. We would potentially use those unstable entries because those leaf page's write generations can appear after the current database's starting write generation, which doesn't make sense after a salvage.

      Further, we don't aggregate timestamp information during salvage, which means a verify of the salvaged file can fail because we'll have created internal pages without timestamp information that reference leaf pages with timestamp information, and verify will fail the file.

       A restart can fix both of these problems:

      In the case of import, we have a valid write-generation for the file from its scanned checkpoint information, and updating the current database's metadata when we write the imported file's checkpoint information into the current database's metadata will move the current database's write generation past the imported file, and so a restart will cause us to ignore all unstable items in the imported file.

      In the case of salvage, it's simpler: by definition, restarting a database causes us to ignore any unstable entries in any file in the database.

      A fix that doesn't require a restart might be to set a "base write generation" entry in a file's metadata during both import and salvage. In the case of import, we'd set the "base write generation" to the maximum write generation of the imported file, in the case of salvage, we'd set it to the maximum write generation of any leaf page we retained during salvage.

      Subsequently, when the salvaged or imported file is opened, we's set the per-file write generation to the maximum of the connection's starting write generation and the file's "base write generation". That ensures we only write pages after the maximum write generation of the import or salvage, and we ignore all unstable values written before the import or salvage.

      We'd also have to change verify to ignore unstable entries in advance of the file's base write generation, so salvaged files verify correctly.

      cc: alexander.gorrod, michael.cahill

            Assignee:
            chenhao.qu@mongodb.com Chenhao Qu
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: