|
There are 7 types of writes to the minValid document to consider:
1. minValid updates after we write a batch of oplog entries before we apply them: These we will timestamp with the minValid time we are writing. We only take stable checkpoints when we are consistent. Thus, the next checkpoint we will take is at this minValid. If we gave it a timestamp from before the batch, and we took a stable checkpoint at that timestamp, then we would consider that timestamp inconsistent, even though it is.
2. minValid updates during rollback via refetch: These updates should only occur on storage engines that do not support recover to stable timestamp, and thus the timestamp should not matter. We will give them a 0 timestamp and add an invariant that we are using a storage engine that does not support recover to stable timestamp.
3. minValid initialization: This occurs at startup, at initiate, and on secondaries when they receive their first config. We will give these a timestamp of 0 since we want them to be in the first checkpoint, even if the checkpoint is for a timestamp in the past. The minValid document could exist already and this could simply add fields to the minValid document, but we still want the initialization write to go into the next checkpoint since a newly initialized minValid document is always valid.
4. removing the old oplog delete from point: This field is going to be removed in 3.8 in SERVER-30556, so we do not care about the write. We will give it a timestamp of 0 in the meantime.
5. setting appliedThrough: This occurs in many places.
- The first is when we first establish a sync source. This sets it to the last applied optime, and should get that same timestamp.
- The next is rollback via refetch which clears appliedThrough so we check the top of the oplog for the appliedThrough. These updates should only occur on storage engines that do not support recover to stable timestamp, and thus the timestamp should not matter. We will give them a 0 timestamp and add an invariant that we are using a storage engine that does not support recover to stable timestamp.
- The next is SyncTail after we've applied a batch of oplog entries. This should set it to the same timestamp since that's where the data is at.
- It is cleared at shutdown to indicate we're consistent at the top of the oplog. This should get the last applied optime for the timestamp in case we're in the process of taking a checkpoint at an earlier timestamp and do not want that checkpoint to reflect this write.
- It is also cleared when transitioning to primary to indicate we're consistent at the top of the oplog. This should get the last applied optime for the timestamp so no checkpoints at earlier timestamps get this write.
- It is set during recovery after each oplog entry is applied. This can get the optime from the oplog entry like in the 3rd bullet.
6. Setting the initial sync flag at the beginning of initial sync: This will get a 0 timestamp because it will be in no stable checkpoints.
7. Clearing the initial sync flag at the end of initial sync: This will get the last applied optime as the timestamp for clarity, though there cannot be any checkpoints taken before it, so it could be 0 as well.
CC milkie and daniel.gottlieb
|