-
Type: Bug
-
Resolution: Won't Fix
-
Priority: Major - P3
-
None
-
Affects Version/s: 4.0.24
-
Component/s: None
-
Labels:None
-
Storage Execution
-
ALL
If a snapshot of a secondary is made while the secondary is writing a batch to an oplog, and that snapshot is used as part of a restore where more than one batch needs to be applied, the restore may fail with a timestamp invariant. This is because the restore process truncates the oplog at the truncate-after timestamp, which pushes the all-committed timestamp to that point (if no storage transactions are open). Since the restore procedure requires we set EMRC=false (to allow recovery as a standalone to be persisted), at the end of each batch we move the oldest timestamp forward to the all-committed timestamp after a delay. If this happens while we are reading a batch rather than applying them, we will move the oldest timestamp ahead of the next batch and trigger the invariant.
This does not affect 4.2 and later because we do not use EMRC=false for restore there.
Probably the simplest solution is to backport the "takeUnstableCheckpointOnShutdown" parameter to 4.0, perhaps along with SERVER-55766, and change the restore procedure to use that instead of EMRC=false