Replication recovery starts replaying oplog based on the `checkpointTimestamp` document. However, the value of that document may be stale relative to the time of the actual checkpoint.
This leads to two problems:
- Replication recovery has to relax constraints when possibly re-applying entries from an earlier time than the data currently represents. The system does not currently have a way to distinguish between "being applied for the first time" and "maybe being applied a subsequent time". Relaxing these constraints brings risk in breaking the typical execution path.
- Reading from a timestamp between the stale checkpoint timestamp value and the actual data time can result in incorrect results. Furthermore, it's impossible to know the actual data time, so the last op applied at recovery is the earliest read time that can be correctly satisfied.
The work to be done for this ticket:
- Have `recoverToStableTimestamp` return a `StatusWith<Timestamp>`. This is to be used by rollback to determine where to start replication recovery from. Amend KVStorageEngine/KVEngine/WiredTigerKVEngine to satisfy the API type.
- Add a method to the storage engine interface that returns the "data timestamp" on disk. This is to be used at startup to determine where to start replication recovery from.
- Have replication recovery query/use these timestamp values for determining where to start applying oplog entries from.