-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Metadata
-
Storage Engines - Foundations
-
None
-
5
-
9
Problem
This issue was introduced after WT-15527, where we began picking up checkpointed shared metadata instead of live shared metadata during checkpoint pickup.
During the statlog walk (component that collects stats), a shared metadata checkpoint handle may be selected. Before the handle is used, the sweep server may mark it as dead (as it had been unusable for n seconds). When statlog later tries to use the handle, it attempts to reopen it. This requires acquiring the checkpoint lock, triggering the following deadlock assertion:
WT_ASSERT_ALWAYS(session,
!FLD_ISSET(session->lock_flags, WT_SESSION_LOCKED_SCHEMA) ||
FLD_ISSET(session->lock_flags, WT_SESSION_LOCKED_CHECKPOINT),
"deadlock");
and causing an abort.
Root cause
Although the statlog walk includes a filter to skip dead dhandles (marked by the sweep server), there is a race condition between the filter check and the actual use of the handle.
A handle can pass the filter as “alive,” but then be marked dead by the sweep server before it is used. As a result, __conn_btree_apply_internal encounters a dead handle and attempts to reopen it.
Reopening the handle requires re-reading it into the cache, which in turn requires both the schema and checkpoint locks. However, only the schema lock is held in this path, leading to the assertion failure.
On a follower, this situation is handled by detecting the state and acquiring the necessary locks. On the leader, this additional locking does not occur, which exposes the issue.