The format of the sharding catalog has been changed starting from version 5.0. The change impacts the contents of all entries in the config.databases/collections/chunks/shards system collections and the upgrade/downgrade steps need to update a lot of data.
The data updates are synchronised per collection under the chunks lock, which means that for collections with a lot of chunks, chunk migrations can be blocked for a longer time. Since the chunk commit happens during the migration critical section, any shard which attempts to commit a chunk migration at that point in time will block access to the portion of the collection that it holds.
In order to mitigate this impact, it is proposed that we do 2 things:
- Make the beginning of setFCV stop the balancer and the end re-enable it (if it wasn't enabled already).
- Make chunk migration fail with a ConflictingOperationInProgress error if it attempts to commit at this point in time
Option (1) can be done later since it needs to remember if the balancer was stopped, but (2) is fairly easy and can be done earlier.