[SERVER-47331] Rethink the transition from force reconfig to safe reconfig Created: 03/Apr/20 Updated: 29/Oct/23 Resolved: 13/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.7.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Repl 2020-04-20 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
When the current config C0 is installed by a "force" reconfig, the next non-force reconfig with config C1 doesn't prevent config divergence if The diverged configs may lead to two primaries elected in the same term until C2 (with a higher config term) propagates to a majority of C1. A similar issue is shown in In Initial Sync Semantics project, we will give new nodes votes: 0 and run automatic reconfig afterwards to grant them votes afterwards. The config to add the node will face the unsafe but rare case mentioned above. Once the first reconfig passes the aforementioned unsafe period and becomes committed, the following automatic reconfigs will be safe. To avoid the unsafe case, one idea is to run an automatic reconfig after a force reconfig by increasing the config version and giving it a config term. After this automatic reconfig, following reconfigs will be safe. However, when users run "force" reconfig, it's likely the replset is not stable so that they are willing to risk the loss of committed data. It may not be the right time to run such an automatic reconfig. Even worse, the automatic reconfig may interrupt the propagation of the "force" reconfig. For example, assuming the current config C0 has 5 nodes, a force reconfig C1 runs on a secondary to convert that secondary to a single node replica set. The force reconfig C1 will increase the version but remove the config term, then propagate to other nodes on their next heartbeats. Nodes in C0 will become REMOVED after learning C1. However, if an automatic reconfig C2 happens on the single node replset, since C2 has a term, C2's term has to be higher than C0 to propagate, which may not be the case if another election occurs in C0. As a result, C2 may not be able to propagate to nodes still in C0. If their terms are the same, nodes in C0 will have a diverged config. They'll be alive and keep running heartbeats to the single node replset. When either of C0 or C2 has a higher term, it will be propagated to the other, potentially overriding the force reconfig. |
| Comments |
| Comment by Siyuan Zhou [ 13/Apr/20 ] |
|
I wanted to mark this "Done", but I have to go with "Fixed" to enable the downstream attention. |
| Comment by Siyuan Zhou [ 13/Apr/20 ] |
|
Thanks tess.avitabile and judah.schvimer, closing this. |
| Comment by Judah Schvimer [ 13/Apr/20 ] |
|
I filed |
| Comment by Tess Avitabile (Inactive) [ 13/Apr/20 ] |
|
Yes, that sounds good to me. |
| Comment by Judah Schvimer [ 09/Apr/20 ] |
|
Thanks for the summary. I will file the ISS tickets once we agree on the above. |
| Comment by Siyuan Zhou [ 09/Apr/20 ] |
|
Discussed with judah.schvimer and evin.roesle in person. Since automatic reconfig in ISS runs on top of the first user-initiated reconfig command, their safety is guaranteed if the user-initiated reconfig is a safe reconfig. If the user-initiated reconfig is a force reconfig, then we won't add newlyAdded fields nor run automatic reconfig at all. The only edge case is when the the user-initiated reconfig is a force reconfig with "newlyAdded" fields. It will trigger automatic reconfig which will run on an unsafe config. There are a few options to solve this issue.
We agreed to go with option 1 since "newlyAdded" is an internal field anyway. Beyond the behavioral change, we need to document that the transition from force reconfig to safe reconfig isn't safe. I'm adding downstream change in this ticket. tess.avitabile, does the plan sound good to you? judah.schvimer, do you mind filing the corresponding ticket in ISS? |
| Comment by Siyuan Zhou [ 08/Apr/20 ] |
C0 shouldn't override the force reconfig. The force reconfig should take effect immediately by having a much higher config number.
I don't think noop automatic reconfig in ISS is dangerous since the reconfig is initiated by a user. After a force reconfig, the first user-initiated safe reconfig to add a node is subject to all the potential issues of force reconfig. Its safety depends on the user as in other cases around force reconfig. In most cases, users would only run reconfig when the system is stable. The following ISS automatic reconfigs will then become safe. As you mentioned, automatic reconfig won't be safe after force reconfig with "newlyAdded".
I'd suggest banning "newlyAdded" on force reconfig, since "newlyAdded" is supposed to be an internal field and force reconfig is supposed to only used in emergency. |
| Comment by Judah Schvimer [ 08/Apr/20 ] |
|
I don't follow the final paragraph above.
I think that doing an automatic reconfig at the next chance we get would be good to narrow the window where the next reconfig will be unsafe, and could allow us to do other automatic reconfigs safely. |
| Comment by Judah Schvimer [ 03/Apr/20 ] |
|
This behavior is implemented and tested in |
| Comment by Judah Schvimer [ 03/Apr/20 ] |
|
In ISS, a force reconfig will replace the current config verbatim and we will not rewrite it at all. Thus if the force reconfig does not specify "newlyAdded" that would remove the "newlyAdded" field from an existing node (if that node currently had "newlyAdded" specified). If the force reconfig specifies "newlyAdded", then once the primary sees that node is a secondary, the primary will initiate an automatic reconfig to remove "newlyAdded". |
| Comment by Siyuan Zhou [ 03/Apr/20 ] |
|
judah.schvimer, what's the current design of Initial Sync Semantics if the current config is from a force reconfig? I don't see any problem in terms of the automatic reconfig. |