Uploaded image for project: 'WiredTiger'
  1. WiredTiger
  2. WT-6072

Allow concurrent usage of getting incremental backup and log data.

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Backup
    • None
    • Storage Engines

      Today, with incremental, sharded backups in MongoDB, the storage usage of an incremental cursor always follows this pattern with "extending" happening after exhausting all backup cursor + incremental information:

      | Parent backup cursor | Incremental reader | Extend reader         |
      |----------------------+--------------------+-----------------------|
      | Open                 |                    |                       |
      | next -> File "A"     |                    |                       |
      |                      | Open dup on "A"    |                       |
      |                      | next -> Block 1    |                       |
      |                      | Close dup on "A"   |                       |
      | next -> File "B"     |                    |                       |
      |                      | Open dup on "B"    |                       |
      |                      | next -> Block 2    |                       |
      |                      | Close dup on "B"   |                       |
      | <time passes>        |                    |                       |
      |                      |                    | Open dup "target=log" |
      |                      |                    | next -> Log file 4    |
      |                      |                    | Close                 |
      | Close                |                    |                       |
      

      MongoDB behaves this way because it exhausts all incremental data into memory up front on the "openBackupCursor" call.

      In an effort to reduce working set size, we'd like to stream data from WT's underlying backup cursor. However, we lose the simplicity of "always exhausting the cursor" (for incremental data). With SERVER-46076 we plan to allow these cases (that sue.loverso believes works, but will double check):

      | Parent backup cursor | Incremental reader | Extend reader         |
      |----------------------+--------------------+-----------------------|
      | Open                 |                    |                       |
      | next -> File "A"     |                    |                       |
      |                      | Open dup on "A"    |                       |
      |                      | next -> Block 1    |                       |
      |                      | Close dup on "A"   |                       |
      |                      |                    | Open dup "target=log" |
      |                      |                    | next -> Log file 4    |
      |                      |                    | Close                 |
      | next -> File "B"     |                    |                       |
      |                      | Open dup on "B"    |                       |
      |                      | next -> Block 2    |                       |
      |                      | Close dup on "B"   |                       |
      | Close                |                    |                       |
      

      However, doing that requires some concurrency control by MongoDB. A client attempting to extend the backup cursor (get the new log files between the checkpoint time and the cluster time for the backup) has no relation to the which blocks are being copied. MongoDB will enforce only one duplicate cursor is open at a time, and that any existing duplicate cursor will be exhausted before closing it.

      A sequence we'd like to have supported (pardon reusing the "duplicate" cursor terminology, I did it for simplicity, not accuracy of the current implementation):

      | Parent backup cursor | Incremental reader | Extend reader         |
      |----------------------+--------------------+-----------------------|
      | Open                 |                    |                       |
      | next -> File "A"     |                    |                       |
      |                      | Open dup on "A"    |                       |
      |                      | next -> Block 1    |                       |
      |                      |                    | Open dup "target=log" |
      |                      | Close dup on "A"   |                       |
      | next -> File "B"     |                    |                       |
      |                      | Open dup on "B"    |                       |
      |                      | next -> Block 2    |                       |
      |                      |                    | next -> Log file 4    |
      |                      |                    | Close                 |
      |                      | Close dup on "B"   |                       |
      | Close                |                    |                       |
      

            Assignee:
            backlog-server-storage-engines [DO NOT USE] Backlog - Storage Engines Team
            Reporter:
            daniel.gottlieb@mongodb.com Daniel Gottlieb (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: