[SERVER-46678] Preserve durable history across restarts Created: 06/Mar/20 Updated: 29/Oct/23 Resolved: 08/Jan/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.9.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Eric Milkie | Assignee: | Daniel Gottlieb (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-234-M3, PM-234-T-data-clone | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Execution Team 2020-04-06, Execution Team 2020-10-05, Execution Team 2020-10-19, Sharding 2020-12-28, Execution Team 2020-11-02, Execution Team 2020-11-16, Sharding 2021-01-11, Sharding 2021-01-25 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
Today, on startup, MongoDB sets the oldest timestamp in such a way as to cause WiredTiger to remove all existing durable history in the history file. This ticket is to change that behavior such that the history in the history file is preserved on startup. |
| Comments |
| Comment by Githook User [ 08/Jan/21 ] |
|
Author: {'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}Message: |
| Comment by Daniel Gottlieb (Inactive) [ 11/Nov/20 ] |
|
I had a talk with geert.bosch and louis.williams. We believe that by sacrificing precision of the minimum visible timestamp, we have a relatively low effort low risk way starting up with legal values for the minimum visible timestamp (i.e: no new data needs to be written out, eliminating any upgrade/downgrade work). Today on startup, the oldest and stable timestamp are set to the checkpoints recovery timestamp (effectively the stable timestamp as of that checkpoint). Today at startup, none of the collections will have a minVisibleTimestamp set. The proposed algorithm would be:
|
| Comment by Eric Milkie [ 18/Aug/20 ] |
|
At the moment, WiredTiger now does preserve history across restarts, but MongoDB sets its oldest timestamp at startup to clear out all preserved history. In order to not clear out preserved history at startup, MongoDB will need to change how we do minimum visible timestamps for collections, as Max and Dan have mentioned in the above comment. |
| Comment by Daniel Gottlieb (Inactive) [ 18/Aug/20 ] |
|
max.hirschhorn noticed that because the minimum visible timestamp is not preserved across restart, it may not be correct for MDB to simply use the oldest timestamp that WT provides. I think he's right – we'll need catalog versioning (or persistence of the minimum visible timestamp/index build completion times). |
| Comment by Vamsi Boyapati [ 07/Apr/20 ] |
|
We had discussed this earlier and it is captured in |
| Comment by Alexander Gorrod [ 06/Apr/20 ] |
|
haribabu.kommi and vamsi.krishna we talked about introducing a mechanism to facilitate this, but I don't remember how far we got. I think we were going to remember the oldest timestamp serviced by each checkpoint, and find the oldest global checkpoint. Did we do that work? If so is there a simple way we could expose the oldest available timestamp for reads after a restart? |
| Comment by Daniel Gottlieb (Inactive) [ 03/Apr/20 ] |
|
The last time I looked into this, WT does not track what a legal oldest_timestamp is across restarts. As in, a WT program can set oldest + stable timestamp to 100, restart, set the oldest timestamp to 50 and perform a read at time 50 and possibly be returned wrong data instead of an error. In the absence of the application writing down the oldest timestamps it has informed WT of, the application must reset the oldest_timestamp to the restarted data's recovery timestamp (stable timestamp at shutdown). A MongoDB-only change to preserve history across restarts would probably be of the form:
On restart, the value read from disk is guaranteed to be a valid oldest_timestamp. The corollary solution in WT would be:
Today, MongoDB updates the oldest timestamp very frequently. I expect MongoDB updates the oldest timestamp much more frequently than WT vacuums history. I suspect the proposed MongoDB algorithm would cause problems due to excessive (albeit small) writes to disk. Alternatively, MongoDB could slow down how often it informs WT of a new oldest timestamp (reduces the writes MongoDB makes, but limits WTs ability to batch/optimize its vacuuming process). alexander.gorrod is there an existing WT ticket aimed at protecting users against setting the oldest timestamp across restarts to an illegal value? |