If we are creating a collection the following scenario might happen:
- We start sharding an empty collection with a hashed key index
- There is a stepdown
- A write sneaks in before resuming the coordinator
- The new primary starts the phase two, but we cannot shard the collection with a hashed index that already has data so the shard collection fails
In order to ensure there is no leftover data at command termination, a drop index was added to the create collection path that will be executed only if the index was not already on the collection and we are recovering from a step down, however, this might generate the following scenario:
Imagine we have a 10TB collection and we try to shard the collection:
- The request would block for 15 min in the critical section to create an index
- The server is overwhelmed with parked network requests which block on the CS and crashes
- The server comes back up and resumes shard collection, same thing happens over and over, because we drop the index and then recreate it again.
We must remove this drop index operation, in order to do so, we could, for example, use the same approach of resharding, by preventing writes on the collections on step up.