[SERVER-31507] Add option to applyOps to fail on upgrade/downgrade Created: 10/Oct/17  Updated: 30/Oct/23  Resolved: 31/Oct/17

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 3.6.0-rc2

Type: New Feature Priority: Major - P3
Reporter: Judah Schvimer Assignee: Judah Schvimer
Resolution: Fixed Votes: 0
Labels: on-prem-3.5.6
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-31387 oplog application conflates upserting... Closed
Related
related to SERVER-31630 getParameter for featureCompatibility... Closed
is related to SERVER-31384 applyOps should propagate oplog appli... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2017-10-23, Repl 2017-11-13
Participants:

 Description   

We do this internally, but don't have it surfaced for external users like MongoMirror, backup, and Mongorestore.



 Comments   
Comment by Githook User [ 31/Oct/17 ]

Author:

{'email': 'judah@mongodb.com', 'name': 'Judah Schvimer', 'username': 'judahschvimer'}

Message: SERVER-31507 add option to specify oplog application mode in applyOps
Branch: master
https://github.com/mongodb/mongo/commit/ced3d3341a2aac6a11297f7dcc2c3c6d2c0e3bec

Comment by David Golden [ 23/Oct/17 ]

SERVER-31630 is an example of the kind of implementation change we'd like this option to insulate tools from.

Comment by Judah Schvimer [ 18/Oct/17 ]

After confirming with spencer, we'll go with just adding a "mode" field that takes a string. We'll document the exact field name and acceptable options once the ticket is complete.

Comment by David Golden [ 18/Oct/17 ]

It's a prioritized list:

  • At a minimum, I'd like for tools/mongomirror to not need to know how FCV is implemented/stored and thus not need to specify in the tools how to detect if it's modified
  • Ideally, I'd like for tools/mongomirror to be able to indicate something about their mode of operation and have the server reject "unsafe" operations for that mode

I don't know the details enough to say what the modes are or what is safe/unsafe for different modes. That is – in part – why I prefer a less granular model where that detailed knowledge isn't required by the team maintaining tools/mongomirror – and distinctly for all versions of the server in the support window.

The notion of "initial sync" versus "steady state replication" seem logical to me. I don't really follow the logic for splitting out upsert behavior (for instance) though I do think having the name "alwaysUpsert" mean more than that is confusing and should be fixed.

Here's how I think of it in more detail:

  • mongorestore is (more or less) equivalent to an initial sync
  • mongomirror is (more or less) equivalent to an initial sync, plus – possibly, not sure if this is true – a switchover to steady state replication after the oplog has been applied through the point where sync and index creation is finished

It would be great to be able to say what mode we're in and let the server determine what's good/bad for each mode based on the operation. Eg. in "initial sync" mode (assuming I understand the issues correctly):

  • When FCV is 3.6, applyOps with renameCollection with a UUID -> that should be OK (it assumes the sync collection scan was done by UUID, but that's a fair assumption)
  • When FCV is 3.4, applyOps with renameCollection arrives -> that should error
  • For any FcV, applyOps changes FCV -> that should error
  • applyOps with update -> converted to upsert (or not converted, but error suppressed if doc doesn't exist, or whatever behavior is decided to be considered correct for initial sync)

After writing all this up, maybe what I'm suggesting is that we should specify a "mode" – which could be "initial sync" or "steady state" (or "rollback" or whatever) – rather than "options".

Does any of that make sense?

Comment by Judah Schvimer [ 17/Oct/17 ]

david.golden, I'm confused what you'd prefer. If we have a "prohibitFCVChange" flag, then we'll also need a "prohibitUnsafeRenames" flag, and you say you'd prefer fewer options. If we do a "prohibitUnsafeOperations" flag, then we may run into a problem in the future where at different times different operations are unsafe and we'd want to separate them out.

Comment by David Golden [ 17/Oct/17 ]

Having multiple options has the problem that older tools won't know about the newer options and newer tools need to be updated for every server release and must be sure to know which options to apply when. That adds complexity and raises the odds of user and/or implementation error. (Also, having yet more things undocumented makes things harder as what to apply when and in what combinations winds up as tribal knowledge passed around by word of mouth.)

Instead, having a simpler, stable API that delegates decisions about whether it's safe to apply certain operations is what tools/mongomirror are looking for. However, at the very least we want to encapsulate concepts like "prohibitFCVchange" so that tools don't need to know the implementation details of feature compatibility versioning (collection names, field names, semantics, etc.).

Comment by Judah Schvimer [ 16/Oct/17 ]

applyOps currently has one flag called "inSteadyStateReplication" which is used to determine if we should convert updates to upserts, if we should fail renameCollection operations without UUIDs, and if we should fail applying upgrade/downgrade operations. SERVER-31387 will separate out these three concepts so that the upserting behavior is tied to the "alwaysUpsert" option, and the upgrade/downgrade/renameCollection failing is only set internally during initial sync, and will always be off for external users of applyOps. This means that external implementations of initial sync (Mongorestore, MongoMirror, Cloud backup) all have to reimplement the upgrade/downgrade/renameCollection failing and take it out when we decide that it's safe again. It makes more sense to push this logic down to the server by surfacing an applyOps parameter to fail during upgrade/downgrade/renameCollection operations.

The name and scope of this parameter is also a question. Should we have one parameter to fail renameCollection and one to fail upgradeDowngrade or just one in general to fail potentially unsafe operations? In future releases it's possible a different set of operations will be unsafe, however it's generally advisable to avoid adding a bunch of new options. I think adding an option for each different failure mode would be the clearest and best. It'll be fairly easy to alert downstream users, and they won't be documented so they can be easily added/removed.

Comment by Spencer Brody (Inactive) [ 12/Oct/17 ]

judah.schvimer Can you summarize our discussion from today about this and lay out the possibilities with some pros/cons?

Generated at Thu Feb 08 04:27:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.