[SERVER-58446] Allow checkpoints taken while backup cursor is open to be used during startup recovery Created: 12/Jul/21  Updated: 29/Oct/23  Resolved: 03/Mar/22

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: 6.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Eric Milkie Assignee: Daniel Gomez Ferro
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to WT-8904 Allow checkpoints taken while backup ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Execution Team 2022-02-21, Execution Team 2022-03-07
Participants:
Case:

 Description   

Today, checkpoints taken while a WT backup cursor is open are not used for startup recovery. This is because when a backup cursor is opened, WT makes the on-disk file layout look like the restore, for easy copying. Part of this is writing a WiredTiger.backup file which overrides the active WiredTiger.wt file, and hides any new checkpoints that are created after the backup cursor is opened. After the file copies are complete and the backup cursor is closed, WT deletes the WiredTiger.backup file and the new checkpoints are once again available for startup recovery to utilize.
If the server should undergo an unclean shutdown while the backup cursor is open, the subsequent startup recovery logic will begin with the last checkpoint completed prior to when the backup cursor was opened, and play the writeahead log forward from that point. This can take a very long time, if the backup cursor was open a long time on a busy system prior to the unclean shutdown.

To fix this, we could implement the following algorithm:
1. Just prior to opening the backup cursor, MongoDB logic would write a new file in the dbpath (or rewrite the storage.bson file, perhaps), as a flag that the backup cursor was open.
2. Just after closing the backup cursor, MongoDB logic would delete this new file or wipe out the flag in storage.bson.
3. At startup time, prior to calling wiredtiger_open(), MongoDB would detect the flag by looking for the file or looking in storage.bson. If the flag is detected, this indicates that MongoDB had an unclean shutdown with a backup cursor open. MongoDB would thus delete the existing WiredTiger.backup file (if it exists), and then clear the flag. This would allow WiredTiger startup recovery to see the newest checkpoint written prior to the unclean shutdown.



 Comments   
Comment by Githook User [ 03/Mar/22 ]

Author:

{'name': 'Daniel Gómez Ferro', 'email': 'daniel.gomezferro@mongodb.com', 'username': 'dgomezferro'}

Message: SERVER-58446 Remove WiredTiger.backup after unclean shutdown during backup
Branch: master
https://github.com/mongodb/mongo/commit/c5ebc557bc96375b949ceb0bc20fc210f42245bc

Comment by Githook User [ 03/Mar/22 ]

Author:

{'name': 'Daniel Gómez Ferro', 'email': 'daniel.gomezferro@mongodb.com', 'username': 'dgomezferro'}

Message: SERVER-58446 Remove WiredTiger.backup after unclean shutdown during backup
Branch: master
https://github.com/10gen/mongo-enterprise-modules/commit/c91bcc0de41a5bbf85f0ae4b9a629f88c1a207df

Generated at Thu Feb 08 05:44:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.