Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9599

Acquire the hot backup lock to call fallocate in the block manager

    • 8
    • v6.0, v5.0, v4.4, v4.2

      When testing VLCS fast-truncate I hit an unrelated problem (occurs on develop) that I have seen one or two times previously but never repeatably. I've now got a format config that reproduces it reasonably reliably; it happens something like 1 in 20 runs in format.sh.

      The failure is:
         [1658151892:867378][24960:0x770bd35ed800], t: [WT_VERB_DEFAULT][ERROR]: __posix_file_read, 426: /old/y/objects/dhwt/RUNDIR.61/T00001.wt: handle-read: pread: failed to read 7168 bytes at offset 223670272: WT_ERROR: non-specific WiredTiger error
      Note that getting WT_ERROR from this (rather than a system errno) means that the error was "unexpected EOF".

      It happens in format's backup thread, which is copying files manually using wt_copy_and_sync. This works by fetching the size of the file and then copying that many bytes; the failure occurs because the file shrinks during the copy, always so far by a small amount (512 bytes, 4K, etc., 7K above) and this causes it to try to read past EOF.

      This must be going on in low-level code that I don't know at all, so probably someone else should look into it rather than me.

      I will upload the config. Note that it's a develop-format config, not a mirror-format config.

      It also creates three VLCS tables because that's what I was testing, but the failure also occurs copying the history store so it isn't VLCS-specific. Plus whatever's going on pretty much must be happening at a lower level than the differences between the btree types.

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            dholland+wt@sauclovia.org David Holland
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: