|
Adding a few general thoughts on this after working on PM-1713. I'd agree that as the code is currently written, the stableTimestamp and the currentCommittedSnapshot are basically redundant, since they are both set to the same value (I'm only addressing the EMRC=true case since the committed snapshot isn't used for majority reads when EMRC=false). However, I'd argue that the stable timestamp and the "committed snapshot" still deserve to be viewed as separate conceptual entities. The stable timestamp is a storage layer concept which determines where we will take our next stable checkpoint, and the committed snapshot determines the timestamp that majority readers read at. These two things happen to be given the same value in the current system, but this wouldn't have to be the case. The main requirements are simply that these timestamps are both "consistent" (i.e. behind the no-overlap point) and majority committed.
It would be nice to move away from the obsolete "snapshot" terminology going forward, though. One thought for a future direction here is to have some kind of method inside ReplicationCoordinator like _getConsistentMajorityTimestamp which dynamically computes a timestamp that is both majority committed and consistent (essentially the timestamp that is returned by _recalculateStableOpTime in today's code). This could then be used to set the stable timestamp when we need to, and majority readers could call this to get a read timestamp on demand, instead of keeping a persistent _currentCommittedSnapshot value around that we need to update whenever the commit point changes.
|