[SERVER-56345] 4.0 snapshots sometimes cannot be restored Created: 26/Apr/21  Updated: 06/Dec/22  Resolved: 17/Dec/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.0.24
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthew Russotto Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Gantt Dependency
Related
Assigned Teams:
Storage Execution
Operating System: ALL
Participants:

 Description   

If a snapshot of a secondary is made while the secondary is writing a batch to an oplog, and that snapshot is used as part of a restore where more than one batch needs to be applied, the restore may fail with a timestamp invariant. This is because the restore process truncates the oplog at the truncate-after timestamp, which pushes the all-committed timestamp to that point (if no storage transactions are open). Since the restore procedure requires we set EMRC=false (to allow recovery as a standalone to be persisted), at the end of each batch we move the oldest timestamp forward to the all-committed timestamp after a delay. If this happens while we are reading a batch rather than applying them, we will move the oldest timestamp ahead of the next batch and trigger the invariant.

This does not affect 4.2 and later because we do not use EMRC=false for restore there.

Probably the simplest solution is to backport the "takeUnstableCheckpointOnShutdown" parameter to 4.0, perhaps along with SERVER-55766, and change the restore procedure to use that instead of EMRC=false



 Comments   
Comment by Connie Chen [ 17/Dec/21 ]

Closing this as "Won't Do" as 4.0 is about to reach EOL

Comment by Daniel Gottlieb (Inactive) [ 28/May/21 ]

matthew.russotto, I agree that backporting takeUnstableCheckpointOnShutdown would fix this issue. We only used enableMajorityReadConcern=false because it seemed like a suitable substitute and (I believe) we were in a pinch to solve something not requiring a server change.

That said, my intuition tells me that any secondary crashing can be in a state where there's an opogTruncateAfter point, and that restarting with eMRC=false is always susceptible to this? Or is the cause specific to a combination of steps in the restore procedure (e.g: inserting extra oplog entries causing more than one batch needing to be applied)?

Comment by Lingzhi Deng [ 27/May/21 ]

Sending this to the Execution Team first because they worked on this before and may have the expertise. But feel free to assign back to Repl if otherwise. geert.bosch

Comment by Judah Schvimer [ 27/May/21 ]

takeUnstableCheckpointOnShutdown was originally added in SERVER-38255.

Generated at Thu Feb 08 05:39:01 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.