[SERVER-30642] Raise election timeouts as a way to provide more stable replica set test topologies Created: 14/Aug/17 Updated: 30/Oct/23 Resolved: 26/Feb/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Replication |
| Affects Version/s: | None |
| Fix Version/s: | 3.6.5, 3.7.3 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Jonathan Abrahams |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||||||||||||||||||
| Backport Requested: |
v3.6
|
||||||||||||||||||||||||||||||||||||
| Sprint: | Repl 2018-01-29, Repl 2018-02-12, TIG 2018-03-12 | ||||||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||||||
| Linked BF Score: | 15 | ||||||||||||||||||||||||||||||||||||
| Description |
|
For Javascript tests that aren't trying to directly test any aspect of the consensus machinery, we should consider making unwanted elections impossible, so as to cut down on the issue of spurious topology changes interfering with the actions a test is executing. Raising election timeouts to some very high value could be one solution to this. It would make tests more resilient to machine/network slowness, and improve their stability. Setting the priority of secondary nodes to 0 (in addition to high election timeouts) could also help reduce the triggering of unexpected election. We may want to consider reviewing tests and see which ones we consider "consensus agnostic", and those we do not. |
| Comments |
| Comment by Githook User [ 19/Apr/18 ] |
|
Author: {'email': 'jonathan@mongodb.com', 'username': 'hptabster', 'name': 'Jonathan Abrahams'}Message: (cherry picked from commit 3aa315557bef775c5291068e365a59a3a810fc41) |
| Comment by Githook User [ 19/Apr/18 ] |
|
Author: {'email': 'judah@mongodb.com', 'username': 'judahschvimer', 'name': 'Judah Schvimer'}Message: (cherry picked from commit 6a1e6fe87e7d510d2e795263520e918c9033e044) |
| Comment by Githook User [ 26/Feb/18 ] |
|
Author: {'email': 'jonathan@mongodb.com', 'name': 'Jonathan Abrahams', 'username': 'hptabster'}Message: |
| Comment by Judah Schvimer [ 09/Feb/18 ] |
|
I anticipated the scope of this ticket to be limited to fsm suites that 0 votes doesn't fix along the same lines as the python fixtures. I do not think this should affect any tests in the replsets or sharding directories. |
| Comment by Max Hirschhorn [ 09/Feb/18 ] |
judah.schvimer, are you anticipating that the TIG team would do this audit, or what is your expectation of the code changes being made here? |
| Comment by Judah Schvimer [ 09/Feb/18 ] |
|
Sending to TIG to finish after |
| Comment by Githook User [ 23/Jan/18 ] |
|
Author: {'name': 'Judah Schvimer', 'email': 'judah@mongodb.com', 'username': 'judahschvimer'}Message: |
| Comment by William Schultz (Inactive) [ 14/Aug/17 ] |
|
There's no reason to believe this test is having "more spurious failovers" than others, but it is one example of a test with such an issue, and so I figured it could be a starting point for this kind of fix, since it has definitively caused a number of build failures due to this issue. But yes, arguably, for tests that only require a stable replica set topology i.e. aren't trying to exercise elections, I think that something like maximizing the election timeout could be a good way to make them more stable and resilient to slow hardware, network issues, etc. Reviewing all tests to see if they fall into this category, however, would likely be a larger task. |
| Comment by Spencer Brody (Inactive) [ 14/Aug/17 ] |
|
Why is this test having more spurious failovers than other tests? Is there a reason we should do this for this test but not all other tests that don't explicitly test election timeouts? |