-
Type:
Bug
-
Resolution: Unresolved
-
Priority:
Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Concurrency
-
None
-
Storage Engines - Foundations
-
716.968
-
None
-
None
Issue Summary
conn->page_log_lock (WT_SPINLOCK, declared in src/include/connection.h) is never initialized and never used, while several functions that read or modify the pagelogqh TAILQ list access it without holding any consistent lock. This is a latent data-race bug on the page-log list.
Context
- page_log_lock is declared at line 932 of connection.h alongside encryptor_lock, which correctly follows the init/use/destroy pattern.
- encryptor_lock is initialized in conn_handle.c (__wt_spin_init), destroyed on close, and taken around every encryptorqh walk/mutation.
- page_log_lock has none of this wiring:
- No __wt_spin_init in conn/conn_handle.c.
- No __wt_spin_destroy on close.
- No _wt_spin_lock / _wt_spin_unlock anywhere in the tree.
- Current state of pagelogqh locking is inconsistent:
Function Lock taken __conn_add_page_log (conn_api.c) api_lock (should be page_log_lock) __conn_get_page_log (conn_api.c) none __wti_conn_remove_page_log (conn_api.c) none __wt_schema_open_page_log (schema_open.c) none
Concurrent calls to add_page_log, get_page_log, or remove_page_log can therefore race on the list head.
Proposed Solution
Wire page_log_lock to match the encryptor_lock pattern:
- Initialize: add __wt_spin_init(session, &conn->page_log_lock, "page log") in conn/conn_handle.c next to the encryptor_lock init.
- Destroy: add __wt_spin_destroy(session, &conn->page_log_lock) in the same file's cleanup path.
- __conn_add_page_log: replace api_lock with page_log_lock around the TAILQ insert.
- __conn_get_page_log: take page_log_lock around the TAILQ_FOREACH walk.
- __wti_conn_remove_page_log: take page_log_lock (or verify it is always called single-threaded at shutdown and document that).
- __wt_schema_open_page_log: take page_log_lock around its TAILQ_FOREACH walk.
Definition of Done
- page_log_lock is initialized, used, and destroyed.
- All pagelogqh accesses hold page_log_lock.
- No api_lock used for page-log list operations.
- TSAN clean on a basic disaggregated-storage test run.