Starting with MongoDB 3.6.0, the creation of sharded databases was made explicit from the point of view of MongoS and the creation logic was moved to the config server. Since the default distributed lock acquisition timeout is still 20 seconds, this causes timeouts when large number of threads suddenly try to write against a database, which does not exist.
What happens is a convoying effect on the -movePrimary distributed lock, which times out and fails writes even though the database is already created. I am able to reproduce this problem 100% using the load phase of the YCSB benchmark with 40 threads.
In order to avoid this effect, before taking the distributed lock, we should take some form of lock manager X lock, like with the other metadata commands after which we should check the database for existence before taking the distributed lock, in order to mitigate the convoying effect.