[SERVER-29891] Roll Back to Checkpoint: Call setStableTimestamp() when commit point or last applied changes Created: 28/Jun/17  Updated: 30/Oct/23  Resolved: 09/Aug/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.5.12

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: William Schultz (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-30309 Have WiredTigerKVEngine implement Sto... Closed
is depended on by SERVER-29215 Coordinate oplog truncate point with ... Closed
is depended on by SERVER-29494 WT validate should wait for explicit ... Closed
Related
related to SERVER-30843 Use std::set::upper_bound when calcul... Closed
related to SERVER-30845 Avoid updating the stable timestamp i... Closed
is related to SERVER-30589 Don't add stable timestamp candidates... Closed
is related to SERVER-30577 Clear list of stable timestamp candid... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2017-07-31, Repl 2017-08-21
Participants:
Linked BF Score: 0

 Description   

Every time a node advances its lastAppliedOpTime, the new optime will be added to a list of potential stable timestamps. The actual stable timestamp will be the greatest timestamp in this list that is less than or equal to the replication commit point. Timestamps can be removed from the list of ‘potential’ stable timestamps after it is less than the current stable timestamp. i.e. when it is less than the replication commit point and there is at least one other timestamp in the list greater than it that is also less than or equal to the replication commit point.

Setting the 'stable' timestamp will tell WiredTiger what is a valid timestamp to take a checkpoint at.



 Comments   
Comment by Eric Milkie [ 25/Aug/17 ]

After further discussion, Will is going to optimize the std\:\:set being used to track stable timestamp candidates, as we believe that is the cause of the current regression. As it turns out, the code to actually tell WiredTiger about the stable timestamp is currently deactivated in the master branch.

Comment by Eric Milkie [ 25/Aug/17 ]

After discussion, we (Dan and I) are going to run a perf patch with the optimization in the glue code to see what effect it has on the regression.

Comment by Daniel Gottlieb (Inactive) [ 25/Aug/17 ]

Storage is tracking these variables. Eric, do you think it's a sufficient optimization for WT glue to only pass the times along to WT proper when its advanced, or do you think replication making this calculation is preferable?

Comment by Eric Milkie [ 25/Aug/17 ]

You're correct. I talked with Spencer after I posted my comment, and I now have a refinement: I think you can skip calling setStableTimestamp() when triggered by a lastAppliedOpTime change if the lastAppliedOpTime is ahead of (greater than) the current lastCommittedOpTime.
Even better would be if we remembered what the last call to setStableTimestamp did, and avoided calling it if the value didn't change from last time.

Comment by William Schultz (Inactive) [ 25/Aug/17 ]

milkie I'm not sure that is correct. As far as I can tell, a particular node's lastCommittedOpTime is computed based on the applied op times of all nodes in the replica set (its current knowledge of that data, at least). For a 3 node replica set, with nodes n0, n1, n2, consider the following:

n0.appliedOpTime = 2
n1.appliedOpTime = 2
n2.appliedOpTime = 1

n2.lastCommittedOpTime would be calculated as 2, while n2.appliedOpTime=1.

Comment by Eric Milkie [ 25/Aug/17 ]

I don't see how the design is indicating that we should call setStableTimestamp() when the last applied optime changes. I believe that it is actually impossible for the stable timestamp to change when the last applied optime changes, since the commit point cannot possibly be greater (later) than the last applied optime.

Comment by Githook User [ 09/Aug/17 ]

Author:

{'username': 'will62794', 'email': 'william.schultz@mongodb.com', 'name': 'William Schultz'}

Message: SERVER-29891 Call setStableTimestamp() when commit point or last applied optime changes
Branch: master
https://github.com/mongodb/mongo/commit/c7661b14867cd058e1a67986b8e05a7020fc0a5e

Comment by William Schultz (Inactive) [ 31/Jul/17 ]

Codereview url:

https://mongodbcr.appspot.com/151340001/

Comment by Judah Schvimer [ 18/Jul/17 ]

From the design doc:
Every time a node advances its lastAppliedOpTime, the new optime will be added to a list of potential stable timestamps. The actual stable timestamp will be the greatest timestamp in this list that is less than or equal to the replication commit point.
Timestamps can be removed from the list of ‘potential’ stable timestamps after it is less than the current stable timestamp. I.E. when it is less than the replication commit point and there is at least one other timestamp in the list greater than it that is also less than or equal to the replication commit point.

Generated at Thu Feb 08 04:22:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.