[SERVER-38356] Forbid dropping oplog in standalone mode on storage engines that support replSetResizeOplog Created: 03/Dec/18 Updated: 29/Oct/23 Resolved: 08/Jul/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | 4.0.4 |
| Fix Version/s: | 4.2.1, 4.3.1 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Kevin Pulo | Assignee: | Vishnu Kaushik |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v4.2, v4.0
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2019-06-03, Repl 2019-06-17, Repl 2019-07-01, Repl 2019-07-15 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 47 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
This ticket banned dropping the oplog in standalone mode entirely on storage engines that support the replSetResizeOplog command. Original DescriptionCurrently the oplog cannot be dropped while running in replset mode, but can be dropped as standalone. Until recently the procedure to resize the oplog included dropping the oplog while in standalone, however, doing this procedure on an uncleanly shutdown 4.0 mongod causes committed writes to be lost (because they only existed in the oplog, and the resize preserves only the final oplog entry, see Completely forbidding oplog drop (even when standalone) would interfere with the use case of restoring a filesystem snapshot as a test standalone. A better alternative would be to forbid dropping the oplog only if local.system.replset contains documents. This way, users who are sure they want to drop the oplog can do so by first removing the documents from local.system.replset (which can't be dropped, but can have its contents removed) and then restarting the standalone. Whereas users who are just trying to perform a manual oplog resize will be stopped before any data loss. If we choose not to do this, then at the very least we should improve the "standalone-but-replset-config-exists" startup warning to specifically warn against to manually resizing the oplog. |
| Comments |
| Comment by Githook User [ 15/Apr/20 ] | ||||||||||||||||||||||
|
Author: {'name': 'Tess Avitabile', 'email': 'tess.avitabile@mongodb.com', 'username': 'tessavitabile'}Message: Revert " This reverts commit 58e4edb8237288f45f55cd8a59ea96a955489353. | ||||||||||||||||||||||
| Comment by Githook User [ 03/Sep/19 ] | ||||||||||||||||||||||
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: | ||||||||||||||||||||||
| Comment by Githook User [ 30/Aug/19 ] | ||||||||||||||||||||||
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: (cherry picked from commit a3244d8ac0ae530e2394248e72aadb27241adba3) | ||||||||||||||||||||||
| Comment by Githook User [ 28/Aug/19 ] | ||||||||||||||||||||||
|
Author: {'name': 'Suganthi Mani', 'username': 'smani87', 'email': 'suganthi.mani@mongodb.com'}Message: (cherry picked from commit a3244d8ac0ae530e2394248e72aadb27241adba3) | ||||||||||||||||||||||
| Comment by Githook User [ 08/Jul/19 ] | ||||||||||||||||||||||
|
Author: {'name': 'Vishnu Kaushik', 'username': 'kauboy26', 'email': 'vishnu.kaushik@mongodb.com'}Message: | ||||||||||||||||||||||
| Comment by Judah Schvimer [ 19/Jun/19 ] | ||||||||||||||||||||||
|
suganthi.mani, thanks for the detailed write up. I agree with it all.
I think we should file a docs ticket and let the docs team decide. | ||||||||||||||||||||||
| Comment by Suganthi Mani [ 18/Jun/19 ] | ||||||||||||||||||||||
|
Below is the chart shows about oplog drop supportability for standalone nodes if we plan to implement as mentioned here.
*EMRC - enableMajorityReadConcern As mentioned in this
3) Secondary node gets restarted as standalone.
5) Restart the secondary node again with --replSet. This means for
This means, we would miss applying the oplog entries in slot old.2 & old.3 (mentioned in step 3) during startup recovery. This would lead to data inconsistencies between this node and other nodes in the replica set. I was trying to reproduce this problem. I was expecting startup recovery (replaying entries from oplog) would be successful and I would see data inconsistency (as per Thoughts:
Let me know if anyone has any concerns on banning the oplog drop entirely for WiredTiger storage engine (that supports replSetResizeOplog cmd). | ||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 05/Jun/19 ] | ||||||||||||||||||||||
Good point. This behavior may not be correct if the user has just toggled enableMajorityReadConcern. On 4.0, when enableMajorityReadConcern=false, the server takes unstable checkpoints, so it should not perform startup recovery by applying oplog entries. In this case, it is correct that standalone nodes with enableMajorityReadConcern=false do not perform startup recovery by applying oplog entries. However, if the user was running with enableMajorityReadConcern=true, then restarted in standalone mode with enableMajorityReadConcern=false and recoverFromOplogAsStandalone, then it will start up from a stable checkpoint, in which case it should perform recovery by applying oplog entries. We should probably make the decision of whether to apply oplog entries when enableMajorityReadConcern=false and recoverFromOplogAsStandalon=true based on the type of checkpoint we start up from, so it sounds like this may be a bug.
We have these two predicates to distinguish between the ability to perform rollback using RTT (which we never do when enableMajorityReadConcern=false) and the ability to start up from a stable checkpoint (which we essentially always do on 4.2 when enableMajorityReadConcern=false, and we do on 4.0 when enableMajorityReadConcern=false only if the server had been shut down with enableMajorityReadConcern=true). | ||||||||||||||||||||||
| Comment by Judah Schvimer [ 05/Jun/19 ] | ||||||||||||||||||||||
|
The concern here is that if on clean restart the node has not applied all of its oplog entries, then we do not want to allow dropping the oplog. All storage engines that allow a clean restart to not have applied all oplog entries also support the replSetResizeOplog command, so they do not need to allow dropping the oplog. As far as I can tell supportsRecoverToStableTimestamp() and supportsRecoveryTimestamp() are essentially the same on 4.2 and 4.0. william.schultz or daniel.gottlieb do you know what Tess had in mind? | ||||||||||||||||||||||
| Comment by Suganthi Mani [ 05/Jun/19 ] | ||||||||||||||||||||||
|
tess.avitabile/judah.schvimer Just wanted to clarify on the solution for 4.0, why can't we have the same check (supportsRecoveryTimestamp() is true) as 4.2 on 4.0? And, other thing, I noticed is that if a node is standalone and if server parameter recoverFromOplogAsStandalone is set to true, we perform startup recovery by applying oplog entries from the recovery timestamp, provided
Is that intentional that on 4.0 for standalone nodes with enableMajorityReadConcern=false (supportsRecoverToStableTimestamp() is false) should not perform startup recovery by applying oplog entries from the recovery timestamp? | ||||||||||||||||||||||
| Comment by Tess Avitabile (Inactive) [ 08/Jan/19 ] | ||||||||||||||||||||||
|
Sounds good. We can forbid dropping local.oplog.rs on 4.0 if supportsRecoverToStableTimestamp() is true and on 4.2 if supportsRecoveryTimestamp() is true (on 4.2 with enableMajorityReadConcern=false, supportsRecoverToStableTimestamp() is false, but we still perform startup recovery by applying oplog entries from the recovery timestamp). I'll put this into the quick wins for next quarter. | ||||||||||||||||||||||
| Comment by Kevin Pulo [ 08/Jan/19 ] | ||||||||||||||||||||||
|
The main problem with completely forbidding dropping the oplog is that it wouldn't be backportable to 4.0, because it's still the only way to resize the oplog in MMAPv1. But this whole issue only exists for storage engines that support recovery to timestamp. So how about we prevent dropping local.oplog.rs if supportsRecoverToStableTimestamp() is true? | ||||||||||||||||||||||
| Comment by Asya Kamsky [ 21/Dec/18 ] | ||||||||||||||||||||||
|
Why not forbid dropping the oplog entirely? I don't see a need for force:true because if you know what you are doing you can drop it anyway. If you are converting the replica backup to a standalone you should just drop the local database which avoids any sort of inconsistency issue. | ||||||||||||||||||||||
| Comment by Kevin Pulo [ 20/Dec/18 ] | ||||||||||||||||||||||
|
I'm surprised by the aversion to adding force: true. Although drop is a DDL command, the situation we're talking about — dropping the oplog (already a special internal system collection) while in a special state (standalone after unclean shutdown) — is maintenance, not a regular operation. This is compounded by the strong potential for unexpected data loss in this situation. There are several other maintenance commands (including within repl) which use force: true (and have for a long time) when we want safe behavior by default, but still need to permit risky operations in rare maintenance situations:
For a startup warning to have a chance of being noticed, it would need to be a separate new warning from the existing ones, and would need to specifically call out that dropping the oplog while in this state (standalone after unclean shutdown) is likely to result in data loss, and that the supported method of resizing the oplog has changed, with a link to the relevant docs. As previously mentioned, in addition to not being noticed, there are other failure modes for this approach, eg. a pre-existing mongo shell will not re-check startup warnings when reconnecting (I've just filed | ||||||||||||||||||||||
| Comment by Gregory McKeon (Inactive) [ 17/Dec/18 ] | ||||||||||||||||||||||
|
We're worried about adding a "force" parameter for only a single command - this would be inconsistent with our other DDL ops. arnie.listhaus also suggested doing replication recovery at startup by default when in standalone mode. We don't want to do this because it interferes with maintenance that is performed in standalone mode, such as truncating the oplog for point-in-time backups and diagnosing the cache pressure of replication recovery. Adding startup warning letting users know that they no longer need to drop the oplog to resize it is our preferred option - do you think this would be noticed enough by users to be effective, kevin.pulo arnie.listhaus? | ||||||||||||||||||||||
| Comment by Kevin Pulo [ 12/Dec/18 ] | ||||||||||||||||||||||
|
Ok, that's fair enough. How about instead requiring a force: true parameter to the drop command, when in this state? The error message could educate the admin about this issue, refer them to the docs and the replSetResizeOplog command, etc. And that if they really want to drop the oplog, they can re-run the drop command with force: true. This should prevent any accidents before they actually happen, while also still allowing arbitrary maintenance in the rare cases it might be necessary, and without being a huge development burden. | ||||||||||||||||||||||
| Comment by Gregory McKeon (Inactive) [ 10/Dec/18 ] | ||||||||||||||||||||||
|
We want to enable users to do arbitrary maintenance in standalone mode, so we don't want to ban dropping the oplog. We don't think adding a startup warning would be helpful, because it doesn't occur at the same time the user performs the drop. If you feel strongly about the warning, let us know. |