[SERVER-59891] Replace the coverage from sharding_continuous_config_stepdown.yml and then delete the test suite Created: 11/Sep/21  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Max Hirschhorn Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-53343 Tests which write to ConfigServer col... Backlog
related to SERVER-58619 Continuous Stepdown's replSetStepDown... Backlog
related to SERVER-53094 Tests which use {waitForDelete:true} ... Closed
related to SERVER-60375 Blacklist move_chunk_remove_shard.js ... Closed
related to SERVER-60751 move_chunk_critical_section_non_inter... Closed
related to SERVER-62181 JStests including multiple parallel m... Closed
related to SERVER-62419 recover_multiple_migrations_on_stepup... Closed
related to SERVER-64234 Remove move_chunk_respects_maxtimems.... Closed
related to SERVER-67733 ShardingTest::awaitBalancerRound() do... Closed
related to SERVER-72820 Retry disable and enable of balancer ... Closed
related to SERVER-57626 Investigate disabled move_chunk tests... Backlog
related to SERVER-59890 Exclude migration_coordinator_shutdow... Closed
Assigned Teams:
Catalog and Routing
Participants:

 Description   

The sharding_continuous_config_stepdown.yml test suite has anecdotally been a pain point for the sharding team because it generates many uninteresting, testing-only failures. Going through the last 2 years of ~100 tickets spawned out of sharding_csrs_continuous_config_stepdown Evergreen task failures, there have been:

  • 33 instances of a testing-only change being made, almost always to exclude the test from the sharding_continuous_config_stepdown.yml test suite.
  • 15 instances of a bug where the server behavior was changed and the sharding_csrs_continuous_config_stepdown Evergreen task failure was the only thing which caught it.
  • 12 additional instances of a bug where the server behavior was changed but other Evergreen tasks (e.g. concurrency stepdown suites) also caught it.

The 33 sharding-csrs-stepdown-upkeep labeled SERVER tickets represent a drag on the Sharding NYC and EMEA teams to write new jstests/sharding/ tests. This is too high of an upkeep to merit continuing to have the sharding_continuous_config_stepdown.yml test suite (without significantly rearchitecting it). On the other hand, the 15 sharding-csrs-stepdown-only labeled SERVER tickets are a clear measure of the value provided by the sharding_continuous_config_stepdown.yml test suite. It would be prudent to ensure new (or already later added) coverage was provided elsewhere to prevent a regression.

The task here is to evaluate whether some additional coverage happens to now exist from later sharding projects, and if not, to create additional SERVER tickets to add such coverage before deleting the sharding_continuous_config_stepdown.yml test suite.

Note: The sharding_continuous_config_stepdown.yml test suite also causes the PeriodicShardedIndexConsistencyChecker thread to run more frequently (triggered as part of new config server primary step-up) which has led to other testing-only failures, mainly from $currentOp filters not being specific enough in tests. These cases are not included in the sharding-csrs-stepdown-upkeep labeled tickets.



 Comments   
Comment by Connie Chen [ 12/Nov/21 ]

This will take at least 1 sprint to figure out how to proceed. 

Generated at Thu Feb 08 05:48:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.