Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-8108

Use temporary files and rename in local store

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: WT10.0.1, 5.0.4, 4.4.10, 5.1.0-rc0
    • Component/s: None
    • Labels:
      None

      Description

      I'm occasionally getting a read error when it tries to read the object in cache-bucket.

      [1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384
      

      The strange thing is that the same thread, looking at the same file, got the (correct) much larger file size and then a smaller one. After the abort due to the error, looking at the database directory for shadow-0000002.wtobj we see:

      WT_TEST.tiered-abort/bucket:
      total 2383652
      -rw-r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj
       
      WT_TEST.tiered-abort/cache-bucket:
      total 1421516
      -r--r--r-- 1 sue adm 363339776 Sep 16 19:15 shadow-0000000002.wtobj
      

      I added debugging in both the local_flush and local_flush_finish to look at the file size of the source file:

      FLUSH: get size for shadow-0000000002.wtobj dest ./bucket/shadow-0000000002.wtobj
      FLUSH: Copy shadow-0000000002.wtobj (363339776) to ./bucket/shadow-0000000002.wtobj
      Checkpoint 3 complete at stable 1325376.
      FLUSH_FINISH: Rename shadow-0000000002.wtobj (132055040) to ./cache-bucket/shadow-0000000002.wtobj
      Flush tier 3 completed.
      [1631819751:993716][13985:0x7f3ba055f700], tiered:shadow, WT_CURSOR.insert: int __posix_file_read(WT_FILE_HANDLE *, WT_SESSION *, wt_off_t, size_t, void *), 460: ./cache-bucket/shadow-0000000002.wtobj: handle-read: pread: failed to read 32768 bytes at offset 362000384 size 132055040
      

      The local_flush_finish just does a rename from the local database source file into the cache directory. The file size in there per the ls -l shows the larger size.

      I can pretty reliably reproduce this style of failure with test_tiered_abort -T 12 -t 10

        Attachments

          Activity

            People

            Assignee:
            donald.anderson Donald Anderson
            Reporter:
            sue.loverso Susan LoVerso
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: