Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-3612

Improve documentation of durability with backup cursors

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 3.6.0-rc0, WT3.0.0
    • Affects Version/s: None
    • Component/s: None
    • None
    • Storage 2017-10-23

      Backup cursors without logging enabled can cause the loss of committed data.

      The WiredTiger backup documention (http://source.wiredtiger.com/2.9.3/backup.html), says:

      Additionally, if a crash occurs during the period the backup cursor is open and logging is disabled, then the system will be restored to the most recent checkpoint prior to the opening of the backup cursor, even if later database checkpoints were created.

      The problem is we create the WiredTiger.wt.backup file, expect the application/user to copy it to the backup directory, and then it is used as the initial WiredTiger.wt file on startup. To get the application/user to copy that file, we create it in the original directory and list it in the files returned by the backup cursor. If the original directory crashes while that file exists, the original directory will also use it as a replacement WiredTiger.wt file, discarding any checkpoints that were completed subsequent to the creation of that file.

      If logging is not configured, you can lose committed transactions should the original directory crash, because completed checkpoints may be ignored on startup.

      If logging is configured, it's not a problem because we'll roll forward all of the changes since the WiredTiger.wt.backup file was created (and we disable log file archiving while a backup cursor is open).

      (However, if logging is configured and the application uses named checkpoints, a crash with an open backup cursor could cause an application's named checkpoint to be lost, which is a detectable failure.)

      The trick is distinguishing the original directory from the copy: if the application copies only those files we return via the backup cursor, then we can create an ignore-the-backup file in the original. If it's not copied, we can easily distinguish between the original and the copy, and ignore the backup file in the original.

      It's harder in the case of applications that ignore any list of files provided by the backup cursor, instead simply copying the database directory.

      Maybe:

      • we could block checkpoints while a backup cursor is open if logging isn't configured – but I expect that's not acceptable for MongoDB, backup cursors are open for too long.
      • we could require applications doing their own backups (ignoring the list of files returned via the backup cursor), to explicitly remove the WiredTiger.wt file from the copied directory, so we can detect when the WiredTiger.wt.backup file needs to be used – but I expect that's not acceptable for MongoDB where users would have to modify their backup scripts.
      • we could require applications to explicitly acknowledge the risk of loss, by adding a configuration to the backup cursor open that stops us from blocking checkpoints for the life of the backup cursor.

            Assignee:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Reporter:
            keith.bostic@mongodb.com Keith Bostic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: