[SERVER-29494] WT validate should wait for explicit checkpoint Created: 07/Jun/17  Updated: 27/Oct/23  Resolved: 09/Jan/18

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Geert Bosch Assignee: Daniel Gottlieb (Inactive)
Resolution: Gone away Votes: 0
Labels: rollback-optional
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-29891 Roll Back to Checkpoint: Call setStab... Closed
depends on SERVER-29212 Ensure WiredTiger checkpoints are cre... Closed
depends on WT-3387 Add support for a stable timestamp Closed
Duplicate
is duplicated by SERVER-30817 Make full validate block until a new ... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2017-12-18, Repl 2018-01-01, Repl 2018-01-15
Participants:

 Description   

When running verifyTable on a WiredTiger file that is not clean, it forces a checkpoint. As we need to avoid creating a checkpoint with data that is not majority confirmed, this behavior is problematic.

Instead, have full validate first take a MODE_S lock on the collection and then call a new method on the WiredTigerCheckpointThread that will trigger and wait for a new majority-confirmed checkpoint.



 Comments   
Comment by Daniel Gottlieb (Inactive) [ 09/Jan/18 ]

This was fixed in WiredTiger instead. Calling verify on a table will take a stable checkpoint of the table.

Comment by Ian Whalen (Inactive) [ 28/Nov/17 ]

We believe we should get this for free given existing WT work in the develop branch. Need to confirm before resolving this ticket.

Comment by Daniel Gottlieb (Inactive) [ 26/Jul/17 ]

WT-3387 will have `verifyTable` obey the stable timestamp. Thus the implementation for this can be to take note of the current time, wait for the "stable timestamp" to reach the saved time, then proceed with validating.

Comment by Daniel Gottlieb (Inactive) [ 05/Jul/17 ]

In the context of a prefixed collection sharing a WT table with other collections, the MODE_S lock will prevent checkpointing new data specific to that collection, but it may "lock in" data from other collections sharing the same table.

Instead this will hopefully be solved by the intersection of MongoDB timestamps in WiredTiger and Recover to a timestamp projects.

MongoDB timestamps in WiredTiger give all (user-data) transactions a timestamp. Recover to a timestamp adds checkpointing at a timestamp. Once those are in place, this ticket would become, "Wait for the "current time" to become majority confirmed, then call verifyTable". The checkpoint committed will be majority confirmed. The time of the verify table will be on data before the validate comes in.

However, this solution can block forever in an unhealthy replica set where the majority confirmed timestamp has stopped moving forward. An alternative is to not wait, but to immediately checkpoint/verify the current majority confirmed data on a validate. This assumes nothing relies on a happens-before relationship with a validate (or more pointedly, MongoDB should not guarantee such a relationship).

Generated at Thu Feb 08 04:21:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.