Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Backup
Labels:
None

Assigned Teams:

Storage Engines, Storage Engines - Persistence
Sprint:
None
Story Points:
None

Today, with incremental, sharded backups in MongoDB, the storage usage of an incremental cursor always follows this pattern with "extending" happening after exhausting all backup cursor + incremental information:

| Parent backup cursor | Incremental reader | Extend reader         |
|----------------------+--------------------+-----------------------|
| Open                 |                    |                       |
| next -> File "A"     |                    |                       |
|                      | Open dup on "A"    |                       |
|                      | next -> Block 1    |                       |
|                      | Close dup on "A"   |                       |
| next -> File "B"     |                    |                       |
|                      | Open dup on "B"    |                       |
|                      | next -> Block 2    |                       |
|                      | Close dup on "B"   |                       |
| <time passes>        |                    |                       |
|                      |                    | Open dup "target=log" |
|                      |                    | next -> Log file 4    |
|                      |                    | Close                 |
| Close                |                    |                       |

MongoDB behaves this way because it exhausts all incremental data into memory up front on the "openBackupCursor" call.

In an effort to reduce working set size, we'd like to stream data from WT's underlying backup cursor. However, we lose the simplicity of "always exhausting the cursor" (for incremental data). With ~~SERVER-46076~~ we plan to allow these cases (that sue.loverso believes works, but will double check):

| Parent backup cursor | Incremental reader | Extend reader         |
|----------------------+--------------------+-----------------------|
| Open                 |                    |                       |
| next -> File "A"     |                    |                       |
|                      | Open dup on "A"    |                       |
|                      | next -> Block 1    |                       |
|                      | Close dup on "A"   |                       |
|                      |                    | Open dup "target=log" |
|                      |                    | next -> Log file 4    |
|                      |                    | Close                 |
| next -> File "B"     |                    |                       |
|                      | Open dup on "B"    |                       |
|                      | next -> Block 2    |                       |
|                      | Close dup on "B"   |                       |
| Close                |                    |                       |

However, doing that requires some concurrency control by MongoDB. A client attempting to extend the backup cursor (get the new log files between the checkpoint time and the cluster time for the backup) has no relation to the which blocks are being copied. MongoDB will enforce only one duplicate cursor is open at a time, and that any existing duplicate cursor will be exhausted before closing it.

A sequence we'd like to have supported (pardon reusing the "duplicate" cursor terminology, I did it for simplicity, not accuracy of the current implementation):

| Parent backup cursor | Incremental reader | Extend reader         |
|----------------------+--------------------+-----------------------|
| Open                 |                    |                       |
| next -> File "A"     |                    |                       |
|                      | Open dup on "A"    |                       |
|                      | next -> Block 1    |                       |
|                      |                    | Open dup "target=log" |
|                      | Close dup on "A"   |                       |
| next -> File "B"     |                    |                       |
|                      | Open dup on "B"    |                       |
|                      | next -> Block 2    |                       |
|                      |                    | next -> Log file 4    |
|                      |                    | Close                 |
|                      | Close dup on "B"   |                       |
| Close                |                    |                       |

Assignee:: [DO NOT USE] Backlog - Storage Engines Team
Reporter:: Daniel Gottlieb (Inactive)
Votes:: 0 Vote for this issue
Watchers:: 4 Start watching this issue

Created:: Apr 27 2020 08:29:18 PM UTC
Updated:: Mar 21 2025 12:28:31 AM UTC

Details

Description

Attachments

Forms

Activity

People

Dates