Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-53094

Tests which use {waitForDelete:true} on moveChunk are not safe to run in the sharding_csrs_continuous_config_stepdown suite



    • Sharding EMEA
    • ALL
    • Sharding 2021-07-12, Sharding 2021-10-04, Sharding 2021-10-18, Sharding 2021-11-01
    • 0
    • 2


      As an example, all of the update_shard_key_*.js suites perform some kind of moveChunk with waitForDelete:true and rely on there not being any orphans for their testing expectations. However waitForDelete:true doesn't work well with stepdowns of the Config Server, because there are numerous places where we can't obey (this one for example).

      Because of this, it is not safe to run tests with waitForDelete:true in the sharding_csrs_continuous_config_stepdown suite and they should be blacklisted.

      In more detail, this is the least that can happen:

      • Some test relies on range deletion to be successful in order to not have orphans on the cluster
      • A moveChunk command succeeds to commit on the Config Server, BUT returns an error to the Balancer, because the Config Server was down and it couldn't re-check its work (this is okay). However, this means it didn't wait for a range deletion.
      • This code noticed that the chunk actually committed, so didn't pass the error to the Router/Client
      • The test happily continues, even though there are orphans on the donor shard


        Issue Links



              backlog-server-sharding-emea Backlog - Sharding EMEA
              kaloian.manassiev@mongodb.com Kaloian Manassiev
              0 Vote for this issue
              7 Start watching this issue