Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-9926

A crash during startup from backup can lose metadata

    • 5
    • Storage Engines - 2022-10-03
    • v6.1, v6.0, v5.0, v4.4, v4.2

      This appears to be the cause of the problem reported in HELP-37618.  If there is a WiredTiger.backup file in the WT home directory, the WiredTiger.wt file is removed and the WiredTiger.backup will be used to repopulate the WiredTiger.wt .  However, the WiredTiger.wt file is not actually written to until after the WiredTiger.backup is removed.  If there is a crash after the WiredTiger.backup is removed and before WiredTiger.wt is written/flushed/fsync-ed, the next restart will start with an empty metadata file, thus losing track of all existing data.

      The test case I have proves a weaker condition as it doesn't actually create a backup, but rather opens a backup cursor, does directory copy of the WT home, and starts a new connection on the copied directory.  That directory "should" look the same as a backup directory, at least as far as the startup process goes.  The relevant strace during the startup is here:

       $ grep -n '.>>>' strace_open.txt
      28:>>>> stat("COPYDIR/WiredTiger.backup", {st_mode=S_IFREG|0664, st_size=89532, ...}) = 0
      32:>>>> stat("COPYDIR/WiredTiger.wt", {st_mode=S_IFREG|0664, st_size=229376, ...}) = 0
      33:>>>> unlink("COPYDIR/WiredTiger.wt")         = 0
      34:>>>> stat("COPYDIR/WiredTiger.turtle", {st_mode=S_IFREG|0664, st_size=1485, ...}) = 0
      35:>>>> unlink("COPYDIR/WiredTiger.turtle")     = 0
      36:>>>>> openat(AT_FDCWD, "COPYDIR/WiredTiger.wt", O_RDWR|O_CREAT|O_EXCL|O_NOATIME|O_CLOEXEC, 0666) = 8
      40:>>>> pwrite64(8, "A\330\1\0\1\0\0\0\330\10#\267\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
      41:>>>> fdatasync(8)                            = 0
      42:>>>> close(8)                                = 0
      48:>>>> openat(AT_FDCWD, "COPYDIR/WiredTiger.wt", O_RDWR|O_NOATIME|O_CLOEXEC) = 8
      49:>>>> fstat(8, {st_mode=S_IFREG|0664, st_size=4096, ...}) = 0
      50:>>>> pread64(8, "A\330\1\0\1\0\0\0\330\10#\267\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096, 0) = 4096
      52:>>>> fstat(8, {st_mode=S_IFREG|0664, st_size=4096, ...}) = 0
      53:>>>> ftruncate(8, 4096)                      = 0
      57:>>>> stat("COPYDIR/WiredTiger.backup", {st_mode=S_IFREG|0664, st_size=89532, ...}) = 0
      58:>>>> openat(AT_FDCWD, "COPYDIR/WiredTiger.backup", O_RDWR|O_CLOEXEC) = 9
      59:>>>> fstat(9, {st_mode=S_IFREG|0664, st_size=89532, ...}) = 0
      60:>>>> pread64(9, "colgroup:test_backup.3\napp_metad"..., 8192, 0) = 8192
      384:>>>> stat("COPYDIR/WiredTiger.backup", {st_mode=S_IFREG|0664, st_size=89532, ...}) = 0
      385:>>>> unlink("COPYDIR/WiredTiger.backup")     = 0
      530:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\rl\0\0Z\0\0\0\7\4\0\1\0p\0\0"..., 28672, 4096) = 28672
      531:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\372o\0\0004\0\0\0\7\4\0\1\0p\0\0"..., 28672, 32768) = 28672
      532:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\4\0\0\0\0\0\0\0\341?\0\0:\0\0\0\7\4\0\1\0@\0\0"..., 16384, 61440) = 16384
      533:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\5\0\0\0\0\0\0\0\10F\0\0.\0\0\0\7\4\0\1\0P\0\0"..., 20480, 77824) = 20480
      535:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\6\0\0\0\0\0\0\0\246\0\0\0\10\0\0\0\6 \0\1\0\20\0\0"..., 4096, 98304) = 4096
      536:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0004\0\0\0\f\0\0\0\1\0\0\1\0\20\0\0"..., 4096, 102400) = 4096
      537:>>>> pwrite64(8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0.\0\0\0\6\0\0\0\1\0\0\1\0\20\0\0"..., 4096, 106496) = 4096
      538:>>>> fdatasync(8)                            = 0

       

      I'll attach the test program and entire strace output below.

      Completion Criteria

      In addition to fixing the bug (we're suggesting moving the removal of the WiredTiger.backup until later in the process), I think there should be some ad hoc testing to make sure it's doing the right thing.  First, we should generate the strace as above, and verify that the writes to WiredTiger.wt get to disk before the backup file is removed.  Second, we should run the startup from backup in the debugger.  We should break and/or single step at various points between until the WiredTiger.wt is removed.  For each of these breakpoints, we should copy the WiredTiger directory to a unique saved directory.  So we end up with a set of WT directories. And for each one, start up WT and make sure we didn't lose any data.  Within the scope of this ticket, I think this kind of testing is the best we can hope for.  Writing a good automated test for these is more involved, and will be done in WT-9932.

        1. strace_open.txt
          47 kB
        2. test_backup29.py
          2 kB

            Assignee:
            sue.loverso@mongodb.com Susan LoVerso
            Reporter:
            donald.anderson@mongodb.com Donald Anderson
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated:
              Resolved: