[SERVER-36772] Ensure oplog cannot be truncated due to capped deletions in standalone mode Created: 20/Aug/18  Updated: 29/Oct/23  Resolved: 19/Mar/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Task Priority: Major - P3
Reporter: Judah Schvimer Assignee: Dianna Hohensee (Inactive)
Resolution: Fixed Votes: 0
Labels: prepare_durability, txn_storage
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-36494 Prevent oplog truncation of oplog ent... Closed
is related to SERVER-39679 Add callback to replication when stor... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2018-09-10, Repl 2018-09-24, Repl 2019-02-25, Storage NYC 2019-03-25
Participants:

 Description   

If the oldest oplog entries are truncated, replication recovery may not be able to succeed. We are recommending that backup services insert into the oplog as a standalone so replication can do its own recovery to a point-in-time, but this requires that the oplog can grow indefinitely.



 Comments   
Comment by Githook User [ 19/Mar/19 ]

Author:

{'email': 'dianna.hohensee@10gen.com', 'name': 'Dianna Hohensee', 'username': 'DiannaHohensee'}

Message: SERVER-36772 Ensure oplog history cannot be truncated in standalone mode with the WT storage engine.

Adds an 'allowOplogTruncation' storageGlobalParam, which is set to false for standalones.
Branch: master
https://github.com/mongodb/mongo/commit/def336dd7510b42c7fbdea22030d0ef5c39bd541

Comment by Judah Schvimer [ 15/Mar/19 ]

dianna.hohensee, some of the code you reference may change with SERVER-39679. I still think your investigation holds, just the details may change. CC daniel.gottlieb and jesse

Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ]

I think we can use the bypass that already exists to ensure successful crash recovery / recover to stable timestamp / backup. If we were to return Timestamp::min() as the timestamp to save back to when in standalone mode, then no stones would be older and we would never delete any stones from the oplog collection. Theoretically, that works.

Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ]

Via the oplog stones method, periodically, we first fetch a recovery timestamp via getPinnedOplog(), which will either be: the backup pinned oplog timestamp; the crash recovery timestamp (or Timestamp::max() if ephemeral engine) if using rollback via refetch; or the crash recovery timestamp (or rollback timestamp if ephemeral engine) if using recover to stable timestamp.

Then, while we have excess stones and the oldest stone's last entry is older than the recovery timestamp, we'll truncate stones from the oplog collection. Whether there is an excess of stones is determined by peekOldestStoneIfNeeded(), which will return the oldest stone only if the collection size exceeds the capped max collection size.

Comment by Dianna Hohensee (Inactive) [ 14/Mar/19 ]

It appears that we use oplog stones for standalone mode as well as replset mode: there's no distinction.

[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:52.168-0400 2019-03-14T13:20:52.168-0400 I -        [js] shell: started program (sh28655):  /home/dianna/mongo-copy/mongod --dbpath /data/db/job0/mongorunner/indexBuilds-1 --port 28023 --bind_ip 0.0.0.0 --setParameter enableTestCommands=1 --setParameter disableLogicalSessionCacheRefresh=true --storageEngine wiredTiger --setParameter transactionLifetimeLimitSeconds=10800 --setParameter orphanCleanupDelaySecs=1 --enableMajorityReadConcern true --setParameter logComponentVerbosity={"replication":{"rollback":2},"transaction":4}
.......
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.561-0400 d28023| 2019-03-14T13:20:57.561-0400 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1552584050, 5)
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.582-0400 d28023| 2019-03-14T13:20:57.582-0400 I STORAGE  [initandlisten] ~~~about to check to set up thread for oplog stones, ns: _mdb_catalog
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.596-0400 d28023| 2019-03-14T13:20:57.596-0400 I STORAGE  [initandlisten] ~~~about to check to set up thread for oplog stones, ns: admin.system.keys
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.616-0400 d28023| 2019-03-14T13:20:57.616-0400 I STORAGE  [initandlisten] ~~~about to check to set up thread for oplog stones, ns: admin.system.version
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.627-0400 d28023| 2019-03-14T13:20:57.627-0400 I STORAGE  [initandlisten] ~~~about to check to set up thread for oplog stones, ns: config.transactions
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.632-0400 d28023| 2019-03-14T13:20:57.632-0400 I STORAGE  [initandlisten] ~~~about to check to set up thread for oplog stones, ns: local.oplog.rs
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.632-0400 d28023| 2019-03-14T13:20:57.632-0400 I STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs               <<<<<
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.632-0400 d28023| 2019-03-14T13:20:57.632-0400 I STORAGE  [initandlisten] ~~~setting up thread for oplog stones, ns: local.oplog.rs
[js_test:characterize_index_builds_on_restart] 2019-03-14T13:20:57.633-0400 d28023| 2019-03-14T13:20:57.632-0400 I STORAGE  [initandlisten] The size storer reports that the oplog contains 108 records totaling to 19201 bytes

So it sounds like what we want is to avoid deleting oplog entires due to new writes in standalone mode, due to either the oplog stones process OR capped collection settings.

WiredTiger (including the in-memory engine variation) is the only supporter of transactions, and the only engine to use an alternative to the typical capped local.oplog.rs collection. Therefore, what we really want is to halt the oplog stones process of deleting oplog entries older than the determined recovery point. We don't care about the regular capped collection deletion process because WT does not use it for the oplog collection.

Comment by Judah Schvimer [ 08/Mar/19 ]

We will just turn off the capped deleter for the oplog in general.

Comment by Judah Schvimer [ 10/Jan/19 ]

This should be done after SERVER-36494, to make sure that it doesn't fall out of that work.

Generated at Thu Feb 08 04:44:02 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.