[SERVER-35344] Stable timestamp and _currentCommittedSnapshot are redundant Created: 01/Jun/18  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Judah Schvimer Assignee: Backlog - Replication Team
Resolution: Unresolved Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-34489 Enable new format Unique Index via FCV Closed
Related
related to SERVER-35420 Remove stable optime candidates list ... Closed
is related to SERVER-49777 Change _currentCommittedSnapshot to b... Backlog
Assigned Teams:
Replication
Participants:
Linked BF Score: 0

 Description   

Storage maintains the stable timestamp and replication maintains the _currentCommittedSnapshot (a holdover from the pre-timestamping days). We could reduce replication complexity and possibly gain some performance benefits by only maintaining it in one place (probably storage). If at some point we are able to read consistently at any timestamp, including mid-oplog-application-batch, then both of these values could likely be removed in favor of the replication commit point.



 Comments   
Comment by William Schultz (Inactive) [ 21/Jul/20 ]

Adding a few general thoughts on this after working on PM-1713. I'd agree that as the code is currently written, the stableTimestamp and the currentCommittedSnapshot are basically redundant, since they are both set to the same value (I'm only addressing the EMRC=true case since the committed snapshot isn't used for majority reads when EMRC=false). However, I'd argue that the stable timestamp and the "committed snapshot" still deserve to be viewed as separate conceptual entities. The stable timestamp is a storage layer concept which determines where we will take our next stable checkpoint, and the committed snapshot determines the timestamp that majority readers read at. These two things happen to be given the same value in the current system, but this wouldn't have to be the case. The main requirements are simply that these timestamps are both "consistent" (i.e. behind the no-overlap point) and majority committed.

It would be nice to move away from the obsolete "snapshot" terminology going forward, though. One thought for a future direction here is to have some kind of method inside ReplicationCoordinator like _getConsistentMajorityTimestamp which dynamically computes a timestamp that is both majority committed and consistent (essentially the timestamp that is returned by _recalculateStableOpTime in today's code). This could then be used to set the stable timestamp when we need to, and majority readers could call this to get a read timestamp on demand, instead of keeping a persistent _currentCommittedSnapshot value around that we need to update whenever the commit point changes.

Comment by Spencer Brody (Inactive) [ 05/Jun/18 ]

This and SERVER-35420 can likely be done together

Comment by Eric Milkie [ 01/Jun/18 ]

We have no release cycle plans to do work in this area.

Comment by Spencer Brody (Inactive) [ 01/Jun/18 ]

milkie, how likely is this to fall out of already-scheduled work the storage team is doing this release cycle?

Generated at Thu Feb 08 04:39:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.