[SERVER-29891] Roll Back to Checkpoint: Call setStableTimestamp() when commit point or last applied changes Created: 28/Jun/17 Updated: 30/Oct/23 Resolved: 09/Aug/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.5.12 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | William Schultz (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2017-07-31, Repl 2017-08-21 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||||||||||||||
| Description |
|
Every time a node advances its lastAppliedOpTime, the new optime will be added to a list of potential stable timestamps. The actual stable timestamp will be the greatest timestamp in this list that is less than or equal to the replication commit point. Timestamps can be removed from the list of ‘potential’ stable timestamps after it is less than the current stable timestamp. i.e. when it is less than the replication commit point and there is at least one other timestamp in the list greater than it that is also less than or equal to the replication commit point. Setting the 'stable' timestamp will tell WiredTiger what is a valid timestamp to take a checkpoint at. |
| Comments |
| Comment by Eric Milkie [ 25/Aug/17 ] |
|
After further discussion, Will is going to optimize the std\:\:set being used to track stable timestamp candidates, as we believe that is the cause of the current regression. As it turns out, the code to actually tell WiredTiger about the stable timestamp is currently deactivated in the master branch. |
| Comment by Eric Milkie [ 25/Aug/17 ] |
|
After discussion, we (Dan and I) are going to run a perf patch with the optimization in the glue code to see what effect it has on the regression. |
| Comment by Daniel Gottlieb (Inactive) [ 25/Aug/17 ] |
|
Storage is tracking these variables. Eric, do you think it's a sufficient optimization for WT glue to only pass the times along to WT proper when its advanced, or do you think replication making this calculation is preferable? |
| Comment by Eric Milkie [ 25/Aug/17 ] |
|
You're correct. I talked with Spencer after I posted my comment, and I now have a refinement: I think you can skip calling setStableTimestamp() when triggered by a lastAppliedOpTime change if the lastAppliedOpTime is ahead of (greater than) the current lastCommittedOpTime. |
| Comment by William Schultz (Inactive) [ 25/Aug/17 ] |
|
milkie I'm not sure that is correct. As far as I can tell, a particular node's lastCommittedOpTime is computed based on the applied op times of all nodes in the replica set (its current knowledge of that data, at least). For a 3 node replica set, with nodes n0, n1, n2, consider the following: n0.appliedOpTime = 2 n2.lastCommittedOpTime would be calculated as 2, while n2.appliedOpTime=1. |
| Comment by Eric Milkie [ 25/Aug/17 ] |
|
I don't see how the design is indicating that we should call setStableTimestamp() when the last applied optime changes. I believe that it is actually impossible for the stable timestamp to change when the last applied optime changes, since the commit point cannot possibly be greater (later) than the last applied optime. |
| Comment by Githook User [ 09/Aug/17 ] |
|
Author: {'username': 'will62794', 'email': 'william.schultz@mongodb.com', 'name': 'William Schultz'}Message: |
| Comment by William Schultz (Inactive) [ 31/Jul/17 ] |
|
Codereview url: |
| Comment by Judah Schvimer [ 18/Jul/17 ] |
|
From the design doc: |