-
Type: Bug
-
Resolution: Gone away
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Sharding
-
Catalog and Routing
-
ALL
-
Sharding 2021-07-12, Sharding 2021-10-04, Sharding 2021-10-18, Sharding 2021-11-01, CAR Team 2023-12-25, CAR Team 2024-01-08
-
0
-
2
As an example, all of the update_shard_key_*.js suites perform some kind of moveChunk with waitForDelete:true and rely on there not being any orphans for their testing expectations. However waitForDelete:true doesn't work well with stepdowns of the Config Server, because there are numerous places where we can't obey (this one for example).
Because of this, it is not safe to run tests with waitForDelete:true in the sharding_csrs_continuous_config_stepdown suite and they should be blacklisted.
In more detail, this is the least that can happen:
- Some test relies on range deletion to be successful in order to not have orphans on the cluster
- A moveChunk command succeeds to commit on the Config Server, BUT returns an error to the Balancer, because the Config Server was down and it couldn't re-check its work (this is okay). However, this means it didn't wait for a range deletion.
- This code noticed that the chunk actually committed, so didn't pass the error to the Router/Client
- The test happily continues, even though there are orphans on the donor shard
- is related to
-
SERVER-46669 moveChunk may succeed but not respect waitForDelete=true if replica set shard primary steps down
- Closed
-
SERVER-59891 Replace the coverage from sharding_continuous_config_stepdown.yml and then delete the test suite
- Backlog