[SERVER-29547] make shell runCommand manually clean up partially written chunks after failed shardCollection in continuous config stepdown override Created: 09/Jun/17  Updated: 30/Oct/23  Resolved: 21/Jun/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.9
Fix Version/s: 3.5.9

Type: Improvement Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Gantt End to End
has to be finished together with SERVER-29107 move shardCollection logic into new _... Closed
Backwards Compatibility: Fully Compatible
Sprint: Sharding 2017-06-19, Sharding 2017-07-10
Participants:

 Description   

Since we are moving shardCollection to the config server, and shardCollection is not idempotent in that:

  • it writes chunks to config.chunks before it inserts an entry for the collection into config.collections
  • at the start of shardCollection, it fails if no entry for the collection exists in config.collections but there are chunks for the collection in config.chunks

and we have a continuous config stepdown suite, which can cause the shardCollection to fail mid-way and be retried from the start by mongos, we must add a workaround in the continuous stepdown suite override.

The workaround should clean up partially written chunks and retry a failed shardCollection if it failed because it saw partially written chunks.

Cons: We have to assume the partially written chunks were due to a config server stepdown, and if there was actually a bug, we will gloss over it by deleting the chunks and retrying.

Pros: It allows us to have a config stepdown suite that uses all the existing jstests in jstests/sharding without modifying any of those tests. Since basically every test calls shardCollection, we can't get away with just blacklisting affected tests.

Note: This was not an issue while shardCollection was on mongos, because we do not have a mongos stepdown suite; if there were config stepdowns during shardCollection, only a particular read or write from mongos to the config servers was retried (and DuplicateKeyError was handled gracefully by mongos).

max.hirschhorn



 Comments   
Comment by Esha Maharishi (Inactive) [ 21/Jun/17 ]

Committed along with SERVER-29107.

Generated at Thu Feb 08 04:21:11 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.