[SERVER-32553] The `removeShard` command is not idempotent for the purposes of the sharding continuous config stepdown suite Created: 05/Jan/18  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: Sharding, Testing Infrastructure
Affects Version/s: 3.6.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
is duplicated by SERVER-30814 removeShard returns ShardNotFound if ... Closed
Related
is related to SERVER-25053 removeShard checks are inherently racy Closed
Assigned Teams:
Catalog and Routing
Operating System: ALL
Steps To Reproduce:

This is causing a new test to fail so should be reevaluated.

Participants:
Linked BF Score: 36

 Description   

The first invocation of `removeShard`, if retried after a stepdown may actually remove the shard instead of just putting it into 'draining' mode. This makes tests running under the continuous config stepdown suite fail intermittently at lines, which call removeShard.

This ticket is to figure out what to do about these tests. We should either blacklist them or change them to expect that removeShard might not find the shard (with an appropriate check beforehand that the shard exists).

Alternatively, we can change the removeShard implementation on mongos to not fail if it found the shard initially, but then got ShardNotFound from the config server.



 Comments   
Comment by Sheeri Cabral (Inactive) [ 23/Jan/20 ]

Could make the tests expect either OK or "shard not found" - it's a lot of tests (every test that calls removeShard).

Generated at Thu Feb 08 04:30:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.