-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Critical - P2
-
Affects Version/s: None
-
Component/s: Disagg CI-blocker, Sweep Server
-
Security Level: Public (Available to anyone on the web)
-
Storage Engines - Foundations
-
1,251.811
-
None
-
None
Issue Summary
We suspect data loss can occur when sweeping a layered dhandle is supposed to be drained during step up. This was identified following the investigation in WT-16703.
Context
- The root cause is sweeping layered dhandles during concurrent draining.
- Multiple solutions were discussed:
- Short term:
- Scanning the entire metadata and draining each table.
- Preventing the sweep server from sweeping layered dhandles.
- Short term:
-
- Long term:
- Tie the lifetime of a layered dhandle to its ingest dhandle.
- Long term:
- The immediate next step is to implement the following short-term solution: the sweep server will not sweep layered dhandles. This approach was chosen due to its low performance impact and minimal code changes.
Proposed Solution
- Implement the change so that the sweep server never sweeps layered dhandles.
- Consider creating a ticket for the long-term solution to tie dhandle lifetimes.
This ticket was generated by AI from a Slack thread.