[SERVER-54389] Audit internal uses of force reconfig and consider using safe reconfig Created: 08/Feb/21  Updated: 29/Oct/23  Resolved: 29/Mar/21

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: Pavithra Vetriselvan Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-53953 Fix invariant failure when downgradin... Closed
is related to SERVER-46516 Majority write concern is blocked by ... Closed
is related to SERVER-50423 Change memberConfig's slaveDelay fiel... Closed
Backwards Compatibility: Fully Compatible
Sprint: Repl 2021-03-22, Repl 2021-04-05
Participants:

 Description   

When using force reconfig, we will bypass the noop write while also clearing the committed snapshot.

Previously, internal users of doReplSetReconfig only changed the config term and version (stepup, upgrade/downgrade). We probably favored force reconfigs in these scenarios because they are faster and still safe. Since the content of the config did not change, and therefore did not affect the quorum condition, we did not drop snapshots.

However, in SERVER-50423, we explicitly rename the slaveDelay field to secondaryDelaySecs, so this check will treat that rename as a content change.



 Comments   
Comment by Githook User [ 29/Mar/21 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@mongodb.com', 'username': 'jasonjhchan'}

Message: SERVER-54389 Add option for safe and optimized reconfig that is separate from forced reconfig
Branch: master
https://github.com/mongodb/mongo/commit/201317dfba066ec91f090e43a297719a25823c6b

Comment by Jason Chan [ 23/Mar/21 ]

Spoke with pavithra.vetriselvan, we decided that as part of this ticket, we will add an "optimized" safe reconfig code path that is entirely separate from the idea of forced reconfig. The intended use case is for internal reconfigs that do not change the consensus group (eg. bumping the version or config term). The "optimized" safe reconfig code path will have the same behavior as a regular safe reconfig, but will omit the following checks described in my comment above:

  • 3) The check that the latest committed optime from the previous config is committed in the current config
  • 7) The quorum check
Comment by Jason Chan [ 22/Mar/21 ]

Separating it out from doReplSetReconfig is fine too, but I guess that would require a bit more refactoring to pull out the common code being used by doReplSetReconfig and the new "safeButOpTimizedReconfig". For what it's worth, adding a new flag to doReplSetReconfig might make it easier to choose the behavior of the "safe but optimized reconfig" codepath in case requirements change in the future.

Comment by Lingzhi Deng [ 22/Mar/21 ]

My understanding is that we don't want to overload doReplSetReconfig for "internal, safe but optimized" reconfigs. So I guess we might want to separate that out from doReplSetReconfig as a whole?

Comment by Jason Chan [ 22/Mar/21 ]

Currently, calling doReplSetReconfig with force=true will have the following differing behavior:
1. We skip the safety check that the reconfig is permitted while not being a writable primary.
2. We skip the safety check that the current rsConfig is allowed to be from an older term.
3. We skip the check that the latest committed optime from the previous config is committed in the current config.
4. We skip the safety check that the reconfig is adding/removing at most 1 voter node.
5. We always hold on to the FCV lock.
6. When finding self in config, we skip the check to see if we are also electable.
7. We skip the quorum check.
8. We skip the no-op write.
9. Some special logic with force reconfig and stepdown. Specific details in SERVER-37574.
10. When the config has changed, always clear the committed snapshot.

After SERVER-53953, the only internal usage of a forced reconfig is in signalDrainComplete. From this comment, I believe we use the force reconfig as a means to optimize the stepUp process by skipping any safety/quorum checks. It should be fine to change this to use a safe reconfig, but doing so may cause a performance regression. From judah.schvimer's comment, we should consider adding an avenue for callers to optimize their safe reconfigs without relying on the "force" flag. This may be in the form of adding a new argument to doReplSetReconfig to skip the most expensive parts of the safe reconfig protocol in cases where the caller acknowledges the reconfig scenario is guaranteed to be "safe".

My guess is that checks (3) and (7) above are the most expensive parts of the safe reconfig protocol. I propose that we add a skipCommitOpTimeAndQuorumCheck flag to doReplSetReconfig that will skip the checks (3) and (7) when set to true. 

cc: lingzhi.deng siyuan.zhou

Comment by Siyuan Zhou [ 08/Feb/21 ]

I believe we abused "force" in the codebase. It could mean
1. skipping safety checks, like single node change
2. skipping quorum check
3. skipping other checks, like in the config check logic.

We use "force" to skip them for different reasons, so I agree we should audit and rename the arguments when necessary.

Comment by Judah Schvimer [ 08/Feb/21 ]

In general, I think we should separate out the concept of "force reconfig" from the concept of "safe, optimized reconfig". Optimizing reconfigs in cases where we know it is safe makes sense, but it is dangerous for us to mix that in with force reconfig. When making changes to force reconfig, it is easy to forget these use cases where we expect the force reconfig behavior to be safe in practice. It is also hard to think through all of the ways force reconfig could be unsafe when using it in safe scenarios.

CC siyuan.zhou, in case you have any thoughts.

Generated at Thu Feb 08 05:33:24 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.