-
Type:
Bug
-
Resolution: Fixed
-
Priority:
Major - P3
-
Affects Version/s: None
-
Component/s: Live Restore
-
None
-
Environment:GCC 13, Ubuntu 24.04
-
Storage Engines - Persistence
-
88.94
-
None
-
1
__live_restore_fh_read(src/live_restore/live_restore_fs.c) contains a fast path at line 760 that is intended to return immediately when the destination file is fully migrated (WTI_DEST_COMPLETE), bypassing the file lock and the slow path entirely.
The fast path dispatches the read with WT_RET rather than a return. The WT_RET macro only returns from the calling function when the inner expression evaluates to a non-zero (error) value. On success, (which is the normal case) execution falls through and continues into the slow path body at line 763, where __wt_readlock is called. The slow path then re-evaluates WTI_DEST_COMPLETE via __live_restore_can_service_read, returns true and issues a second redundant read from the destination before releasing the lock.
Note: The bug is not a corner case like it fires on every read to every file handle once migration finishes, which is the normal operating state. No data loss or crash occurs, but the fast-path contract is violated, every affected read performs twice the destination I/O, and reads unnecessarily contend with concurrent write locks that the fast path is designed to bypass entirely.
For ReProduce:
A Catch2 regression test in the test/catch2/live_restore/api/test_live_restore_fh_read_fast_path.cpp file (tag is [live_restore_fh_read_fast_path]) demonstrates the defect mechanically.
I intentionally did not include the test code in the PR; we can discuss whether the test code is appropriate for the PR. I have attached the test code. However, I have implemented the fix in the PR.
Purposed Fix:
change at line 761 of src/live_restore/live_restore_fs.c, replace WT_RET with return.