[SERVER-30926] Add timestamps to writes to minvalid document Created: 01/Sep/17  Updated: 30/Oct/23  Resolved: 13/Dec/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.7.1

Type: Task Priority: Major - P3
Reporter: Spencer Brody (Inactive) Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-30472 ReplicationConsistencyMarkers::writeC... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2017-12-04, Repl 2017-12-18
Participants:

 Description   

Writes to the minValid document need to be timestamped so that they don't persist after a call to recoverToStableTimestamp. During secondary batch application there are 2 main writes to the minvalid collection that are relevant: one that sets the minValid to the end of the batch and another that sets the appliedThrough to the end of the batch. Both those writes should be given the timestamp of the end of the batch, so that if we recover to a time before the batch the writes will be undone.



 Comments   
Comment by Githook User [ 13/Dec/17 ]

Author:

{'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}

Message: SERVER-30926 Add timestamps to writes to minValid document
Branch: master
https://github.com/mongodb/mongo/commit/6796859387ba77a3556ed583a317681a288970e4

Comment by Judah Schvimer [ 05/Dec/17 ]

There are 7 types of writes to the minValid document to consider:
1. minValid updates after we write a batch of oplog entries before we apply them: These we will timestamp with the minValid time we are writing. We only take stable checkpoints when we are consistent. Thus, the next checkpoint we will take is at this minValid. If we gave it a timestamp from before the batch, and we took a stable checkpoint at that timestamp, then we would consider that timestamp inconsistent, even though it is.
2. minValid updates during rollback via refetch: These updates should only occur on storage engines that do not support recover to stable timestamp, and thus the timestamp should not matter. We will give them a 0 timestamp and add an invariant that we are using a storage engine that does not support recover to stable timestamp.
3. minValid initialization: This occurs at startup, at initiate, and on secondaries when they receive their first config. We will give these a timestamp of 0 since we want them to be in the first checkpoint, even if the checkpoint is for a timestamp in the past. The minValid document could exist already and this could simply add fields to the minValid document, but we still want the initialization write to go into the next checkpoint since a newly initialized minValid document is always valid.
4. removing the old oplog delete from point: This field is going to be removed in 3.8 in SERVER-30556, so we do not care about the write. We will give it a timestamp of 0 in the meantime.
5. setting appliedThrough: This occurs in many places.

  • The first is when we first establish a sync source. This sets it to the last applied optime, and should get that same timestamp.
  • The next is rollback via refetch which clears appliedThrough so we check the top of the oplog for the appliedThrough. These updates should only occur on storage engines that do not support recover to stable timestamp, and thus the timestamp should not matter. We will give them a 0 timestamp and add an invariant that we are using a storage engine that does not support recover to stable timestamp.
  • The next is SyncTail after we've applied a batch of oplog entries. This should set it to the same timestamp since that's where the data is at.
  • It is cleared at shutdown to indicate we're consistent at the top of the oplog. This should get the last applied optime for the timestamp in case we're in the process of taking a checkpoint at an earlier timestamp and do not want that checkpoint to reflect this write.
  • It is also cleared when transitioning to primary to indicate we're consistent at the top of the oplog. This should get the last applied optime for the timestamp so no checkpoints at earlier timestamps get this write.
  • It is set during recovery after each oplog entry is applied. This can get the optime from the oplog entry like in the 3rd bullet.

6. Setting the initial sync flag at the beginning of initial sync: This will get a 0 timestamp because it will be in no stable checkpoints.
7. Clearing the initial sync flag at the end of initial sync: This will get the last applied optime as the timestamp for clarity, though there cannot be any checkpoints taken before it, so it could be 0 as well.

CC milkie and daniel.gottlieb

Comment by Judah Schvimer [ 29/Nov/17 ]

These writes do need to be timestamped since the checkpoint thread runs asynchronously and the minValid document could change between the time at which the checkpoint is taken and the time that it is read for the checkpoint.

Generated at Thu Feb 08 04:25:27 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.