[SERVER-44410] Change backup cursor results to include a filesize Created: 04/Nov/19  Updated: 29/Oct/23  Resolved: 29/Jan/20

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.2.4, 4.3.4

Type: Improvement Priority: Major - P3
Reporter: Daniel Gottlieb (Inactive) Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
depends on SERVER-45481 Change the backup API to return the b... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Execution Team 2020-01-27, Execution Team 2020-02-10, Execution Team 2019-12-30
Participants:

 Description   

Consider the case where the only changes to a WT file between incremental backups are to truncate its length.

WT's API with duplicate cursors will neatly express this case to its caller (the server), but the current MongoDB API does not preserve that format going to its caller (network client). Without a change, a backup client could never safely shrink a file it was incrementally backing up.

When WT returns the files to copy on the top-level backup cursor, it will also return a filelength that the backup application can safely truncate the file to. When opening up a duplicate backup cursor there may or may not be results. The proposed algorithm for MongoDB:

  1. MongoDB reads a <filename>, <filesize> pair from WT.
  2. MongoDB opens a "duplicate" cursor keyed by the <filename>.
  3. If the duplicate cursor returns EOF, MongoDB will add a document to the backup cursor stream with the following contents:

    { filename: <filename>,
      new_filesize: <filesize>,
      offset: 0,
      length: 0 }
    

  4. If the duplicate cursor returns specific blocks to copy, each with an <offset>, <length> pair, each document added to the backup cursor stream should be of the format:

    { filename: <filename>,
      new_filesize: <filesize>, // duplicated just like <filename>
      offset: <offset>,
      length: <length> }
    



 Comments   
Comment by Githook User [ 25/Feb/20 ]

Author:

{'name': 'Gregory Wlodarek', 'username': 'GWlodarek', 'email': 'gregory.wlodarek@mongodb.com'}

Message: SERVER-44410 Change backup cursor results to include a file size

(cherry picked from commit d1fe1746711948441c7a366059e58afbd6b05bd8)
Branch: v4.2
https://github.com/mongodb/mongo/commit/6d2375017e9ba507b577baa93be3b4bf767e1f18

Comment by Githook User [ 29/Jan/20 ]

Author:

{'username': 'GWlodarek', 'name': 'Gregory Wlodarek', 'email': 'gregory.wlodarek@mongodb.com'}

Message: SERVER-44410 Change backup cursor results to include a file size
Branch: master
https://github.com/mongodb/mongo/commit/d1fe1746711948441c7a366059e58afbd6b05bd8

Comment by Gregory Wlodarek [ 24/Jan/20 ]

After having a discussion with Sue about this, the work described here would not be feasible as the technical design has changed since this was written up.

 
The main point of including the fileSize always for us was to let the backup application know when it was safe to truncate the file so it can have better space efficiency. But currently, it’s only possible to retrieve the fileSize today when we get the WT_BACKUP_FILE type on the incremental cursor, which means to copy the whole file.

What we’re hoping for sometime in the future is a way to also get the fileSize when:

1. There were no changes to the file's blocks, but the file shrank, and so it is safe for the backup application to truncate the file.
2. When there are changes to the file's blocks we need to make, but the file also shrank, and so it is safe for the backup application to truncate the file.

 

As a temporary workaround, querying the file directly through the filesystem for its file size is OK to do today, but sometime down the road, we should switch to using WiredTiger's functionality when it supports our use cases.

Generated at Thu Feb 08 05:05:54 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.