[SERVER-36772] Ensure oplog cannot be truncated due to capped deletions in standalone mode Created: 20/Aug/18 Updated: 29/Oct/23 Resolved: 19/Mar/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.1.10 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Judah Schvimer | Assignee: | Dianna Hohensee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | prepare_durability, txn_storage | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Sprint: | Repl 2018-09-10, Repl 2018-09-24, Repl 2019-02-25, Storage NYC 2019-03-25 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
If the oldest oplog entries are truncated, replication recovery may not be able to succeed. We are recommending that backup services insert into the oplog as a standalone so replication can do its own recovery to a point-in-time, but this requires that the oplog can grow indefinitely. |
| Comments |
| Comment by Githook User [ 19/Mar/19 ] | |||||||||||
|
Author: {'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna Hohensee', 'username': 'DiannaHohensee'}Message: Adds an 'allowOplogTruncation' storageGlobalParam, which is set to false for standalones. | |||||||||||
| Comment by Judah Schvimer [ 15/Mar/19 ] | |||||||||||
|
dianna.hohensee, some of the code you reference may change with | |||||||||||
| Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ] | |||||||||||
|
I think we can use the bypass that already exists to ensure successful crash recovery / recover to stable timestamp / backup. If we were to return Timestamp::min() as the timestamp to save back to when in standalone mode, then no stones would be older and we would never delete any stones from the oplog collection. Theoretically, that works. | |||||||||||
| Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ] | |||||||||||
|
Via the oplog stones method, periodically, we first fetch a recovery timestamp via getPinnedOplog(), which will either be: the backup pinned oplog timestamp; the crash recovery timestamp (or Timestamp::max() if ephemeral engine) if using rollback via refetch; or the crash recovery timestamp (or rollback timestamp if ephemeral engine) if using recover to stable timestamp. Then, while we have excess stones and the oldest stone's last entry is older than the recovery timestamp, we'll truncate stones from the oplog collection. Whether there is an excess of stones is determined by peekOldestStoneIfNeeded(), which will return the oldest stone only if the collection size exceeds the capped max collection size. | |||||||||||
| Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ] | |||||||||||
|
It appears that we use oplog stones for standalone mode as well as replset mode: there's no distinction.
So it sounds like what we want is to avoid deleting oplog entires due to new writes in standalone mode, due to either the oplog stones process OR capped collection settings. WiredTiger (including the in-memory engine variation) is the only supporter of transactions, and the only engine to use an alternative to the typical capped local.oplog.rs collection. Therefore, what we really want is to halt the oplog stones process of deleting oplog entries older than the determined recovery point. We don't care about the regular capped collection deletion process because WT does not use it for the oplog collection. | |||||||||||
| Comment by Judah Schvimer [ 08/Mar/19 ] | |||||||||||
|
We will just turn off the capped deleter for the oplog in general. | |||||||||||
| Comment by Judah Schvimer [ 10/Jan/19 ] | |||||||||||
|
This should be done after |