[SERVER-46894] Wait for the current config to be committed before running reconfig Created: 16/Mar/20 Updated: 29/Oct/23 Resolved: 04/Apr/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 4.4.0-rc0, 4.7.0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Siyuan Zhou | Assignee: | Siyuan Zhou |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||
| Backport Requested: |
v4.4
|
||||||||||||||||||||||||
| Sprint: | Repl 2020-03-23, Repl 2020-04-06 | ||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||
| Linked BF Score: | 0 | ||||||||||||||||||||||||
| Description |
|
Currently, we wait for both Config Replication and Oplog Commitment at the end of reconfig command. The former guarantees the new oplog cannot be "rolled back", the latter guarantees the following config can be accepted. However, it seems better to wait for the latter only when it's needed on receiving the following config. As an example, adding one node with votes: 1 to a single node replset will have to wait for the initial sync to finish before returning the reconfig currently. Another case is after an election, Oplog Commitment requires the first optime in its term to be committed, which may fail a reconfig following the election immediately. Waiting for the first optime in its term to be committed will make the command finish successfully. |
| Comments |
| Comment by Githook User [ 05/Apr/20 ] |
|
Author: {'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}Message: This also changes the behavior of waiting for both Config Replication and (cherry picked from commit 89ec7322a58686b89aa71f26b1f050ded94cf949) |
| Comment by Githook User [ 04/Apr/20 ] |
|
Author: {'name': 'Siyuan Zhou', 'email': 'siyuan.zhou@mongodb.com', 'username': 'visualzhou'}Message: This also changes the behavior of waiting for both Config Replication and |
| Comment by Siyuan Zhou [ 19/Mar/20 ] |
|
If the primary fails to wait for Config Replication or Oplog Commitment for the current config after maxTimeMS, it leaves the current config unchanged and returns a new error code “CurrentConfigNotCommittedYet”, with the following example error messages:
If the primary fails to wait for Config Replication of the new config at the end of reconfig, it will return the “ExceededTimeLimit” error code with the following message:
|
| Comment by Siyuan Zhou [ 19/Mar/20 ] |
|
Quote from evin.roesle: "We would like to show the log message of waiting by default so that users have the ability to understand that it is hung without actually installing the reconfig." This ticket needs to make sure the log message is shown by default. |