When we shard a collection, _shardServerShardCollection runs a count command on the config.chunks collection to make sure there are no chunks already in the collection. But, right now, the count query uses readConcern local, which can cause a problem in the following scenario
- Create a sharded collection
- Drop the sharded collection, which succeeds on the config server primary
- Try to create a sharded collection with the same name
- _shardServerShardCollection runs a count command to see if chunks exist from a previous sharded collection with the same name
- The count command targets the config server secondary, where the replication of the drop from the previous collection has not completed, and so there are still chunks in the collection
- The _shardServerShardCollection command fails with the error "ManualInterventionRequired: A previous attempt to shard collection <collection name> failed after writing some initial chunks to config.chunks. Please manually delete the partially written chunks for collection test.user from config.chunks", even though the previous drop completed successfully
We should use readAfterOpTime to do this count command so that even if we read from a config server secondary, we'll wait until previous operations have replicated.