[SERVER-35996] Create performance tests for measuring failover speed for planned stepdowns Created: 06/Jul/18 Updated: 08/Jan/24 Resolved: 27/Mar/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Replication |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | William Schultz (Inactive) | Assignee: | Backlog - Replication Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | PM-1211, re-triaged-ticket | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Replication
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
We would like to add better performance testing for measuring the speed of planned replica set failovers using replSetStepDown. This includes testing the following:
We should test these scenarios with chaining enabled and disabled. |
| Comments |
| Comment by Lauren Lewis (Inactive) [ 09/Nov/21 ] |
|
We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket. |
| Comment by Tess Avitabile (Inactive) [ 15/Aug/18 ] |
|
Thanks, mira.carey@mongodb.com. I modified this ticket so that it is about time to write-availability and time to commit-availability for a single writer during planned maintenance. |
| Comment by Mira Carey [ 15/Aug/18 ] |
|
I plan to do some testing around elections, but focused on what happens after the election clears (and targeted towards mongos behavior, things like how manipulating connection pooling strategies changes things). I also suspect that the easiest test for you would be a single writer (and time for a w:majority write to clear) where I'm going to want some large number of writers across many connections. It's related testing, but not substantially overlapping |
| Comment by Tess Avitabile (Inactive) [ 25/Jul/18 ] |
|
Thanks, I will bring this up in the developer productivity quarterly planning. Failover time very important to Atlas, so I agree that either replication, service architecture, or perf should prioritize this work. |
| Comment by William Schultz (Inactive) [ 25/Jul/18 ] |
|
The attached JS script is a good starting point for testing item 2 from the ticket description. |
| Comment by William Schultz (Inactive) [ 06/Jul/18 ] |
|
This testing would help us detect issues like the ones mentioned in |