[SERVER-66171] Make checkpoint operations persist derived metadata values that have change since the last checkpoint Created: 03/May/22 Updated: 06/Dec/22 Resolved: 01/Sep/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Dianna Hohensee (Inactive) | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | DM-M2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Storage Execution
|
||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
This will take some timestamp coordination. The derived metadata updates cannot have valid-at timestamps earlier than the checkpoint's recovery timestamp, aka the stable_timestamp at which it checkpoints. Additionally, the derived metadata valid-at timestamps must have reached the majority committed point, so that replication oplog recovery is guaranteed to replay up to those timestamps (as opposed to stopping short, leaving us with an incorrect count/dataSize that we cannot "undo"). Derived metadata values that have not changed since the last checkpoint need not be written out in the checkpoint. We should probably build some kind of registry for namespaces that need to be persisted? We'll need to either stall the stable_timestamp from moving forward (so the checkpoint timestamp doesn't move forward) or specify a stable_timestamp at which the checkpoint should be taken. I favor the latter, but have not investigated whether there would be any WT ramifications with the change, or that the MDB won't weirdly not work with it for some subtle reason. Clean shutdown should save the latest derived metadata values before the checkpoint. In a silent system, the checkpoint should be entirely clean, all oplog applied, count/dataSize up to date. |