After SERVER-81032, the backup cursor service reports backup cursor checkpointTimestamp that does not match the actual checkpointTimestamp at which WT opened a backup cursor, i.e, reported checkpointTimestamp can be <= actual checkpointTimestamp) , instead of ==
Given the fact committing the checkpoint and updating txn_global.last_ckpt_timestamp (reported by getLastStableRecoveryTimestamp()) aren't atomic. This means, we can end up a scenario, like below
1) CKPT thread: WT checkpoint committed for TS(100) with ckptId:100
2) BackupService thread: Opens the _mdb_catalog cursor with read source as KCheckpoint.
- This will open the checkpoint cursor on the latest checkpoint, ckptId:100
3) BackupService thread: Calls getLastStableRecoveryTimestamp() and reads the previous checkpoint ts values , say TS(90).
4) CKPT thread: Updates the {{txn_global.last_ckpt_timestamp }} to TS(100)
5) BackupService thread: Opens the backup cursor
6) BackupService thread: Verifies if any checkpoint was taken between step #3 and #5 .
- For which, It agains opens the checkpoint cursor on _mdb_catalog and reads checkpoint id as ckptId:100, and compares with step#2 checkpoint Id.
Since, step #2 and step#6 checkpoint Id are same, the sanity check in step#6 passes. However, now the backup cursor returns the `checkpointTimestamp` as TS(90) (ie, step #3 value) instead of actual checkpoint ts value at which WT opened backup cursor, which is TS(100).
Before SERVER-81032, given the fact WT takes checkpoint lock when opening the backup cursor (step #5) and for the entire checkpoint job, at step#6, calling the getLastStableRecoveryTimestamp() would guarantee to return at least TS(100), in the above case . And, I think, any new checkpoints between step#5 and #6 is uninteresting. So, it's ok, even if Step#6 reads the stale last checkpoint ts.
My proposal would be to make step 6 to use the original way , which is using `getLastStableRecoveryTimestamp()`
- is related to
-
SERVER-81032 Fix checkpoint detection while opening a backup cursor
- Closed
-
WT-11709 API to retrieve timestamp of a checkpoint cursor
- Closed
-
SERVER-81208 Use checkpoint cursor timestamp when opening backup cursor
- In Progress