[SERVER-85688] Set stable timestamp to end of each oplog batch during startup recovery for restore Created: 24/Jan/24  Updated: 02/Feb/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 8.0 Required

Type: Bug Priority: Major - P3
Reporter: Xuerui Fa Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-84706 Investigate if setting the oldest tim... In Code Review
is related to SERVER-84706 Investigate if setting the oldest tim... In Code Review
Assigned Teams:
Replication
Operating System: ALL
Participants:

 Description   

This came out of an investigation during SERVER-84706. Today, startup recovery for restore first sets the initial data timestamp to the sentinel so that we take unstable checkpoints. We also attempts to set the stable timestamp to Timestamp::min(). However, this is essentially a no-op, as we do not allow trying to reset the stable timestamp if it is null. In addition, it is not safe to set the stable timestamp backwards.

After discussion within RSS, we determined that it should be equally performant and safe to set the stable timestamp during startup recovery for restore and take stable checkpoints. This would allow us to advance the stable timestamp alongside the oldest timestamp, preventing any future occurrences of SERVER-84706. We should do this to resolve this bug.


Generated at Thu Feb 08 06:58:22 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.