Priority: Minor - P4
Affects Version/s: 5.0.3
Fix Version/s: 5.0 Required
Sprint:Sharding EMEA 2021-11-15, Sharding EMEA 2021-11-29, Sharding EMEA 2021-12-13, Sharding EMEA 2021-12-27, Sharding EMEA 2022-01-10, Sharding EMEA 2022-01-24
Linked BF Score:20
In jstests/concurrency/fsm_workloads/random_DDL_setFCV_operations.js it can happen that we encounter a ManualIntervetionRequired error when trying to shard a collection.
This means that a previous shard collection attempt in FCV 4.4 managed to create some chunks for a collection but it crashed or stepped down before to actually write the relative entry in config.collection. Leaving orphaned chunks in config.chunks.
When this occurs all the threads that received the ManualInterventionRequired error will attempt to directly remove the orphaned chunk documents from config.chunks and they will retry to shard the collection.
Since there is no synchronization between these threads, it can totally happen that:
- T1 receives ManualInterventionRequired for coll1
- T2 receives ManualInterventionRequired for coll1
- T1 removes orphaned chunks for coll1
- T1 re-issue the shard collection and correctly create coll1 with its own chunks
- T2 removes the chunks for coll1
- T2 re-issue the shard collection and find the collection is already sharded so it does nothing
So T2 will leave coll1 with a collection entry in config.collection but no chunks accounted in config.chunks.
In this situation every nodes that will try to refresh its catalog cache for coll1 will encounter a ConflictingOperationInProgress error