[SERVER-39679] Add callback to replication when storage takes a checkpoint to learn of the maximum oplog truncation timestamp Created: 19/Feb/19 Updated: 29/Oct/23 Resolved: 22/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||
| Sprint: | Repl 2019-03-11, Repl 2019-03-25, Repl 2019-04-08 | ||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||
| Description |
|
Due to Prepare Support for Transactions and Larger Transactions Than 16MB, we cannot truncate transaction oplog entries if their commit oplog entry isn't stable yet. The current solution is to pass the current oldest active transaction to setStableTimestamp(), so that we have to read the oldest active transaction at the stable timestamp, the oldest required timestamp, every time we set the stable timestamp. To save this read, we could let the storage layer call back the replication system when it's about to start a checkpoint. Replication will read the oldest required timestamp or calculate this timestamp in other ways, then return the timestamp to storage. After the checkpoint, storage uses the oldest required timestamp to let oplog truncation thread know where it can truncate up to. Passing the oldest required timestamp to storage can be done asynchronously if that makes storage work easier. |
| Comments |
| Comment by Githook User [ 22/Mar/19 ] |
|
Author: {'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}Message: |
| Comment by Judah Schvimer [ 22/Feb/19 ] |
daniel.gottlieb, we need this a little more urgently now that we've pivoted our design for |
| Comment by Siyuan Zhou [ 21/Feb/19 ] |
|
Sounds great. Thanks Dan! Storage can still cache _oplogNeededForRollback in memory, so that it can report to serverStatus and FTDC. |
| Comment by Daniel Gottlieb (Inactive) [ 20/Feb/19 ] |
|
The description makes sense to me. I'll highlight the what I think the code changes are (this is a dark corner of the storage code). I expect the only "mechanical" change is to have this method return the value of the callback, instead of reading a variable from memory. The more "design"y change that's part of this ticket is to figure out how replication passes knowledge of the callback through the StorageEngine interface -> KVStorageEngine -> KVEngine -> WiredTigerKVEngine. The "edge case"y parts have to do with startup as storage naturally starts its checkpoint and oplog truncation threads before replication has a chance to register a callback. |
| Comment by Siyuan Zhou [ 19/Feb/19 ] |
|
daniel.gottlieb, does the description look accurate to you? |
| Comment by Judah Schvimer [ 19/Feb/19 ] |
|
We expect this is required for good performance in 4.2, but can use the existing plan from |