Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-59891

Replace the coverage from sharding_continuous_config_stepdown.yml and then delete the test suite

    • Type: Icon: Task Task
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Sharding
    • Labels:
    • Catalog and Routing

      The sharding_continuous_config_stepdown.yml test suite has anecdotally been a pain point for the sharding team because it generates many uninteresting, testing-only failures. Going through the last 2 years of ~100 tickets spawned out of sharding_csrs_continuous_config_stepdown Evergreen task failures, there have been:

      • 33 instances of a testing-only change being made, almost always to exclude the test from the sharding_continuous_config_stepdown.yml test suite.
      • 15 instances of a bug where the server behavior was changed and the sharding_csrs_continuous_config_stepdown Evergreen task failure was the only thing which caught it.
      • 12 additional instances of a bug where the server behavior was changed but other Evergreen tasks (e.g. concurrency stepdown suites) also caught it.

      The 33 sharding-csrs-stepdown-upkeep labeled SERVER tickets represent a drag on the Sharding NYC and EMEA teams to write new jstests/sharding/ tests. This is too high of an upkeep to merit continuing to have the sharding_continuous_config_stepdown.yml test suite (without significantly rearchitecting it). On the other hand, the 15 sharding-csrs-stepdown-only labeled SERVER tickets are a clear measure of the value provided by the sharding_continuous_config_stepdown.yml test suite. It would be prudent to ensure new (or already later added) coverage was provided elsewhere to prevent a regression.

      The task here is to evaluate whether some additional coverage happens to now exist from later sharding projects, and if not, to create additional SERVER tickets to add such coverage before deleting the sharding_continuous_config_stepdown.yml test suite.

      Note: The sharding_continuous_config_stepdown.yml test suite also causes the PeriodicShardedIndexConsistencyChecker thread to run more frequently (triggered as part of new config server primary step-up) which has led to other testing-only failures, mainly from $currentOp filters not being specific enough in tests. These cases are not included in the sharding-csrs-stepdown-upkeep labeled tickets.

            backlog-server-catalog-and-routing Backlog - Catalog and Routing
            max.hirschhorn@mongodb.com Max Hirschhorn
            0 Vote for this issue
            3 Start watching this issue


                Error rendering 'slack.nextup.jira:slack-integration-plus'. Please contact your Jira administrators.