[SERVER-36811] Provide a mechanism for replication to specify the 'maximum_truncation_timestamp' for a given 'stable_timestamp' Created: 22/Aug/18  Updated: 29/Oct/23  Resolved: 17/Sep/18

Status: Closed
Project: Core Server
Component/s: Replication, Storage
Affects Version/s: None
Fix Version/s: 4.1.4

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Daniel Gottlieb (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_durability
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-36494 Prevent oplog truncation of oplog ent... Closed
Related
related to SERVER-39679 Add callback to replication when stor... Closed
Backwards Compatibility: Fully Compatible
Sprint: Storage NYC 2018-09-10, Storage NYC 2018-09-24
Participants:

 Description   

We must ensure that the ‘prepare’ oplog entries for any transactions that are prepared at the ‘stable timestamp’ (for replication rollback) or ‘last stable checkpoint timestamp’ (for startup replication recovery) are not truncated off the oplog (or any oplog entries for active transactions in the “Transactions larger than 16MB project”). If so then nodes will not be able to re-apply these uncommitted transactions. ‘last stable checkpoint timestamp’ <= ‘stable timestamp’, so we will focus on the ‘last stable checkpoint timestamp’ case since that implicitly makes sure we have all of the oplog entries we need for the ‘stable timestamp’ or the current point in time.

To do this, whenever replication tells storage about a new ‘stable timestamp’, it will also provide it with a ‘maximum_truncation_timestamp’. This ‘maximum_truncation_timestamp’ will be the latest timestamp that the storage engine is allowed to truncate off the back of the oplog when its accompanying ‘stable timestamp’ is the current ‘last stable checkpoint timestamp’. Replication will provide the timestamp that was the ‘oldest active transaction timestamp’ at the time of the ‘stable timestamp’. This is not the current ‘oldest active transaction timestamp’, but rather an older value of it.



 Comments   
Comment by Githook User [ 17/Sep/18 ]

Author:

{'name': 'Daniel Gottlieb', 'email': 'daniel.gottlieb@mongodb.com', 'username': 'dgottlieb'}

Message: SERVER-36811: Save oplog dating back to oldest actively prepared transaction.
Branch: master
https://github.com/mongodb/mongo/commit/beba8d70803cc14768c577bc7ec1aff5c0c352ed

Comment by Judah Schvimer [ 04/Sep/18 ]

Yes that's correct. I think either a boost::none or a null timestamp would suffice for "no active transactions at that stable timestamp". I'll leave that to your judgement. I don't expect it to matter on our end.

Comment by Daniel Gottlieb (Inactive) [ 30/Aug/18 ]

judah.schvimer additionally for clarification, I believe logic needs to be applied for the inMemory storage engine as well?

Actually thinking about inMemory... inMemory also has an oplog that gets truncated. During a live rollback, inMemory will play forward from the stable timestamp. I don't think it's sufficient for preserving oplog only with respect to when data gets checkpointed. inMemory will need to update the acceptable truncation point as the stable timestamp goes forward and not only when the stable checkpoint timestamp advances.

Does that sound correct?

Comment by Daniel Gottlieb (Inactive) [ 30/Aug/18 ]

judah.schvimer, did you have a preference on what value gets passed in when there were no active transactions at that stable timestamp?

Comment by Judah Schvimer [ 22/Aug/18 ]

I discussed this with daniel.gottlieb during the design of "prepare support for transactions".

Generated at Thu Feb 08 04:44:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.