Make sure we don't issue any reads in disagg before last_materialized_lsn is set.

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Block Manager
    • Storage Engines, Storage Engines - Persistence
    • None
    • None

      WT-15589 was intended to ensure we never read ahead of the materialization frontier. However, enabling that check fully results in multiple failures, with errors like:

      Attempting to read page with LSN 4 ahead of the materialization frontier at LSN 0
      

      These appear in layered and other Disagg tests when reopening an existing Disagg database even though a proper checkpoint LSN is passed in the configuration.

      The goal of this ticket is to ensure that the initialization order is correct - specifically, that the LSN is properly initialized before any page reads from storage occur. No tests should need to be rewritten.

      Steps to reproduce:

      1. Remove or comment out the line in the __block_disagg_check_lsn_frontier() function:

      last_materialized_lsn != WT_DISAGG_LSN_NONE
      

      2. Run layered or Disagg tests, for example:

      $ test/suite/run.py test_layered14
      

      You'll see errors like:

      [1760390298:667279][20864:0x20040e140], test_layered14.test_layered14.test_layered_random_cursor(palm), file:WiredTigerShared.wt_stable, checkpoint-pick-up-shared: [WT_VERB_DEFAULT][ERROR]: int __block_disagg_check_lsn_frontier(WT_SESSION_IMPL *, uint64_t), 69: Attempting to read page with LSN 4 ahead of the materialization frontier at LSN 0
      [1760390298:667646][20864:0x20040e140], test_layered14.test_layered14.test_layered_random_cursor(palm), file:WiredTigerShared.wt_stable, checkpoint-pick-up-shared: [WT_VERB_DEFAULT][ERROR]: int __wti_btree_tree_open(WT_SESSION_IMPL *, const uint8_t *, size_t), 761: unable to read root page from file:WiredTigerShared.wt_stable: Invalid argument
      [1760390298:667760][20864:0x20040e140], test_layered14.test_layered14.test_layered_random_cursor(palm), connection: [WT_VERB_LAYERED][ERROR]: Disagg pick up checkpoint for meta_lsn =5, failed with: 22
      

      Eventually, the line "last_materialized_lsn != WT_DISAGG_LSN_NONE" should not be needed.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Yury Ershov
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

              Created:
              Updated:
              Resolved: