[SERVER-20982] Add concurrency workload framework to triggers various node failovers Created: 16/Oct/15  Updated: 06/Dec/22  Resolved: 09/Nov/17

Status: Closed
Project: Core Server
Component/s: Sharding, Testing Infrastructure
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Kamran K. Assignee: [DO NOT USE] Backlog - Sharding Team
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-20981 Add background thread functionality t... Closed
is depended on by SERVER-21051 Add a suite that runs existing shardi... Closed
is depended on by SERVER-21053 Add a failover workload to cause intr... Closed
is depended on by SERVER-21054 Add failover workload to cause CSRS c... Closed
Duplicate
duplicates SERVER-31198 Run the concurrency suite with shard ... Closed
duplicates SERVER-7573 Add tests for network connectivity lo... Closed
Assigned Teams:
Sharding
Sprint: Sharding B (10/30/15), Sharding D (12/11/15), Sharding E (01/08/16)
Participants:

 Description   

We'd like to introduce a concurrency/FSM workload that triggers config server failovers to verify that they occur transparently to clients.

Basic implementation notes:

  • Add a new workload to jstests/concurrency/fsm_workloads/ that randomly steps down / freezes / etc. the config server primary
    • Note: You may want to prevent the workload from running with itself
  • Modify jstests/concurrency/fsm_libs/runner.js to support "persistent" workloads that run in each schedule

Here's what a schedule looks like now:

[ [ "jstests/concurrency/fsm_workloads/update_array_noindex.js" ], [ "jstests/concurrency/fsm_workloads/update_multifield_isolated_multiupdate_noindex.js" ], [ "jstests/concurrency/fsm_workloads/update_rename_noindex.js" ]...

Here's what the schedule should look like after:

[ [ "jstests/concurrency/fsm_workloads/update_array_noindex.js", "jstests/concurrency/fsm_workloads/config_server_failover.js" ], [ "jstests/concurrency/fsm_workloads/update_multifield_isolated_multiupdate_noindex.js", "jstests/concurrency/fsm_workloads/config_server_failover.js" ], [ "jstests/concurrency/fsm_workloads/update_rename_noindex.js", "jstests/concurrency/fsm_workloads/config_server_failover.js" ]...



 Comments   
Comment by Kamran K. [ 18/Feb/16 ]

I think this task is still worthwhile for a few reasons:
1 - The continuous stepdown suite has a large blacklist that probably introduces coverage gaps.
2 - Even without the blacklist, the sharding suite probably doesn't cover the range of operations that the FSM suite does.
3 - The FSM suite has stricter assertions than all the other suites, which can help catch subtle issues.
4 - I would expect that only a few sharding tests are performing concurrent client operations.

Judah and Esha wrote code to expose cluster connections and add background workloads to the FSM suite, so I think it's less than a day of work to write a basic workload that steps down the CSRS primary periodically. That would at least give us an idea about whether we can expect to find new bugs with this approach (before deciding if it's worth adding as a permanent suite, making it robust for Evergreen, etc.).

Comment by Spencer Brody (Inactive) [ 09/Feb/16 ]

This seems like most of the test coverage we were hoping to get with this are being obtained through SERVER-21050 and SERVER-21051.

I propose closing this ticket as a duplicate of those - kamran.khan, what do you think?

Generated at Thu Feb 08 03:55:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.