Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53094

Tests which use {waitForDelete:true} on moveChunk are not safe to run in the sharding_csrs_continuous_config_stepdown suite

    • Catalog and Routing
    • ALL
    • Sharding 2021-07-12, Sharding 2021-10-04, Sharding 2021-10-18, Sharding 2021-11-01, CAR Team 2023-12-25, CAR Team 2024-01-08
    • 0
    • 2

      As an example, all of the update_shard_key_*.js suites perform some kind of moveChunk with waitForDelete:true and rely on there not being any orphans for their testing expectations. However waitForDelete:true doesn't work well with stepdowns of the Config Server, because there are numerous places where we can't obey (this one for example).

      Because of this, it is not safe to run tests with waitForDelete:true in the sharding_csrs_continuous_config_stepdown suite and they should be blacklisted.

      In more detail, this is the least that can happen:

      • Some test relies on range deletion to be successful in order to not have orphans on the cluster
      • A moveChunk command succeeds to commit on the Config Server, BUT returns an error to the Balancer, because the Config Server was down and it couldn't re-check its work (this is okay). However, this means it didn't wait for a range deletion.
      • This code noticed that the chunk actually committed, so didn't pass the error to the Router/Client
      • The test happily continues, even though there are orphans on the donor shard

            Assignee:
            allison.easton@mongodb.com Allison Easton
            Reporter:
            kaloian.manassiev@mongodb.com Kaloian Manassiev
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: