[SERVER-47009] Bypass replica set config commitment safety checks if voting member set doesn't change Created: 20/Mar/20 Updated: 06/Dec/22 Resolved: 13/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | safe-reconfig-related | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||
| Assigned Teams: |
Replication
|
||||
| Participants: | |||||
| Description |
|
If we execute a reconfig from C1 => C2, and then try to execute a reconfig from C2 => C3, we must ensure that all ops committed in configs prior to C2 are committed in C2 and that C2 is committed before moving to C3. If C3, however, doesn't actually change the voting member set of the config, we can bypass the safety checks when moving from C2 => C3, since the config is essentially a no-op with respect to member set changes. The next time we do a reconfig that changes the voting member set, we will make sure the safety checks are satisfied. This ensures that all previously committed ops and configs are committed before moving to a config with different members. This is an optimization that could make certain reconfigs less costly since they won't need to wait for oplog commitment or config replication. For example, changing the election timeout or changing the priority of a node. |
| Comments |
| Comment by Siyuan Zhou [ 30/Mar/20 ] |
|
william.schultz, this sounds correct to me. I agree the fact that the new config is essentially a no-op in config consensus means it's safe in terms of the config consensus, but it's not obvious to me why it doesn't affect the safety of data consensus. At least, I think we need to update our definition of the two rules of safety: Oplog Commitment and maybe Config Replication and run model checking for the change. |