page_log_lock is declared but never initialized or used, leaving pagelogqh list unprotected

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Concurrency
    • None
    • Storage Engines - Foundations
    • 716.968
    • None
    • None

      Issue Summary

      conn->page_log_lock (WT_SPINLOCK, declared in src/include/connection.h) is never initialized and never used, while several functions that read or modify the pagelogqh TAILQ list access it without holding any consistent lock. This is a latent data-race bug on the page-log list.

      Context

      • page_log_lock is declared at line 932 of connection.h alongside encryptor_lock, which correctly follows the init/use/destroy pattern.
      • encryptor_lock is initialized in conn_handle.c (__wt_spin_init), destroyed on close, and taken around every encryptorqh walk/mutation.
      • page_log_lock has none of this wiring:
        • No __wt_spin_init in conn/conn_handle.c.
        • No __wt_spin_destroy on close.
        • No _wt_spin_lock / _wt_spin_unlock anywhere in the tree.
      • Current state of pagelogqh locking is inconsistent:
        Function Lock taken
        __conn_add_page_log (conn_api.c) api_lock (should be page_log_lock)
        __conn_get_page_log (conn_api.c) none
        __wti_conn_remove_page_log (conn_api.c) none
        __wt_schema_open_page_log (schema_open.c) none

      Concurrent calls to add_page_log, get_page_log, or remove_page_log can therefore race on the list head.

      Proposed Solution

      Wire page_log_lock to match the encryptor_lock pattern:

      1. Initialize: add __wt_spin_init(session, &conn->page_log_lock, "page log") in conn/conn_handle.c next to the encryptor_lock init.
      2. Destroy: add __wt_spin_destroy(session, &conn->page_log_lock) in the same file's cleanup path.
      3. __conn_add_page_log: replace api_lock with page_log_lock around the TAILQ insert.
      4. __conn_get_page_log: take page_log_lock around the TAILQ_FOREACH walk.
      5. __wti_conn_remove_page_log: take page_log_lock (or verify it is always called single-threaded at shutdown and document that).
      6. __wt_schema_open_page_log: take page_log_lock around its TAILQ_FOREACH walk.

      Definition of Done

      • page_log_lock is initialized, used, and destroyed.
      • All pagelogqh accesses hold page_log_lock.
      • No api_lock used for page-log list operations.
      • TSAN clean on a basic disaggregated-storage test run.

            Assignee:
            [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            Etienne Petrel
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated: