[SERVER-39679] Add callback to replication when storage takes a checkpoint to learn of the maximum oplog truncation timestamp Created: 19/Feb/19  Updated: 29/Oct/23  Resolved: 22/Mar/19

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Task Priority: Major - P3
Reporter: Siyuan Zhou Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: prepare_durability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-36494 Prevent oplog truncation of oplog ent... Closed
Related
related to SERVER-58184 Checkpoint thread causes assertions w... Closed
related to SERVER-36772 Ensure oplog cannot be truncated due ... Closed
related to SERVER-39680 Maintain the oldest active transactio... Closed
is related to SERVER-36811 Provide a mechanism for replication t... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2019-03-11, Repl 2019-03-25, Repl 2019-04-08
Participants:

 Description   

Due to Prepare Support for Transactions and Larger Transactions Than 16MB, we cannot truncate transaction oplog entries if their commit oplog entry isn't stable yet.

The current solution is to pass the current oldest active transaction to setStableTimestamp(), so that we have to read the oldest active transaction at the stable timestamp, the oldest required timestamp, every time we set the stable timestamp. To save this read, we could let the storage layer call back the replication system when it's about to start a checkpoint. Replication will read the oldest required timestamp or calculate this timestamp in other ways, then return the timestamp to storage. After the checkpoint, storage uses the oldest required timestamp to let oplog truncation thread know where it can truncate up to.

Passing the oldest required timestamp to storage can be done asynchronously if that makes storage work easier.



 Comments   
Comment by Githook User [ 22/Mar/19 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'username': 'ajdavis', 'email': 'jesse@mongodb.com'}

Message: SERVER-39679 Get oldest transaction time when snapshotting
Branch: master
https://github.com/mongodb/mongo/commit/78eaa3cf538764d5aa5a09c5997532a4c3b2bca8

Comment by Judah Schvimer [ 22/Feb/19 ]

The more "design"y change that's part of this ticket is to figure out how replication passes knowledge of the callback through the StorageEngine interface -> KVStorageEngine -> KVEngine -> WiredTigerKVEngine.

The "edge case"y parts have to do with startup as storage naturally starts its checkpoint and oplog truncation threads before replication has a chance to register a callback.

daniel.gottlieb, we need this a little more urgently now that we've pivoted our design for SERVER-36494 to require this ticket. Can you please provide guidance on the above two pieces so we can take the implementation of this off of your hands without having too much come up in review?

Comment by Siyuan Zhou [ 21/Feb/19 ]

Sounds great. Thanks Dan! Storage can still cache _oplogNeededForRollback in memory, so that it can report to serverStatus and FTDC.

Comment by Daniel Gottlieb (Inactive) [ 20/Feb/19 ]

The description makes sense to me. I'll highlight the what I think the code changes are (this is a dark corner of the storage code).

I expect the only "mechanical" change is to have this method return the value of the callback, instead of reading a variable from memory.

The more "design"y change that's part of this ticket is to figure out how replication passes knowledge of the callback through the StorageEngine interface -> KVStorageEngine -> KVEngine -> WiredTigerKVEngine.

The "edge case"y parts have to do with startup as storage naturally starts its checkpoint and oplog truncation threads before replication has a chance to register a callback.

Comment by Siyuan Zhou [ 19/Feb/19 ]

daniel.gottlieb, does the description look accurate to you?

Comment by Judah Schvimer [ 19/Feb/19 ]

We expect this is required for good performance in 4.2, but can use the existing plan from SERVER-36811 in the meantime to get to correct behavior.

Generated at Thu Feb 08 04:52:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.