Details
Description
I'm occasionally getting a read error when it tries to read the object in cache-bucket.
[1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384
|
The strange thing is that the same thread, looking at the same file, got the (correct) much larger file size and then a smaller one. After the abort due to the error, looking at the database directory for shadow-0000002.wtobj we see:
WT_TEST.tiered-abort/bucket:
|
total 2383652
|
-rw-r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj
|
|
WT_TEST.tiered-abort/cache-bucket:
|
total 1421516
|
-r--r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj
|
I added debugging in both the local_flush and local_flush_finish to look at the file size of the source file:
FLUSH: get size for shadow-0000000002.wtobj dest ./bucket/shadow-0000000002.wtobj
|
FLUSH: Copy shadow-0000000002.wtobj (363339776) to ./bucket/shadow-0000000002.wtobj
|
Checkpoint 3 complete at stable 1325376.
|
FLUSH_FINISH: Rename shadow-0000000002.wtobj (132055040) to ./cache-bucket/shadow-0000000002.wtobj
|
Flush tier 3 completed.
|
[1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384 size 132055040
|
The local_flush_finish just does a rename from the local database source file into the cache directory. The file size in there per the ls -l shows the larger size.
I can pretty reliably reproduce this style of failure with test_tiered_abort -T 12 -t 10