[SERVER-56345] 4.0 snapshots sometimes cannot be restored Created: 26/Apr/21 Updated: 06/Dec/22 Resolved: 17/Dec/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 4.0.24 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Matthew Russotto | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
If a snapshot of a secondary is made while the secondary is writing a batch to an oplog, and that snapshot is used as part of a restore where more than one batch needs to be applied, the restore may fail with a timestamp invariant. This is because the restore process truncates the oplog at the truncate-after timestamp, which pushes the all-committed timestamp to that point (if no storage transactions are open). Since the restore procedure requires we set EMRC=false (to allow recovery as a standalone to be persisted), at the end of each batch we move the oldest timestamp forward to the all-committed timestamp after a delay. If this happens while we are reading a batch rather than applying them, we will move the oldest timestamp ahead of the next batch and trigger the invariant. This does not affect 4.2 and later because we do not use EMRC=false for restore there. Probably the simplest solution is to backport the "takeUnstableCheckpointOnShutdown" parameter to 4.0, perhaps along with |
| Comments |
| Comment by Connie Chen [ 17/Dec/21 ] |
|
Closing this as "Won't Do" as 4.0 is about to reach EOL |
| Comment by Daniel Gottlieb (Inactive) [ 28/May/21 ] |
|
matthew.russotto, I agree that backporting takeUnstableCheckpointOnShutdown would fix this issue. We only used enableMajorityReadConcern=false because it seemed like a suitable substitute and (I believe) we were in a pinch to solve something not requiring a server change. That said, my intuition tells me that any secondary crashing can be in a state where there's an opogTruncateAfter point, and that restarting with eMRC=false is always susceptible to this? Or is the cause specific to a combination of steps in the restore procedure (e.g: inserting extra oplog entries causing more than one batch needing to be applied)? |
| Comment by Lingzhi Deng [ 27/May/21 ] |
|
Sending this to the Execution Team first because they worked on this before and may have the expertise. But feel free to assign back to Repl if otherwise. geert.bosch |
| Comment by Judah Schvimer [ 27/May/21 ] |
|
takeUnstableCheckpointOnShutdown was originally added in |