Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-33639

Concurrent writes against non-existent database can fail due to distlock acquisition timeout at `createDatabase` time

    XMLWordPrintable

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 3.6.3, 3.7.2
    • 3.6.6, 4.0.0-rc1, 4.1.1
    • Sharding
    • None
    • Fully Compatible
    • v4.0
    • Sharding 2018-05-21, Sharding 2018-06-04
    • 0

    Description

      Starting with MongoDB 3.6.0, the creation of sharded databases was made explicit from the point of view of MongoS and the creation logic was moved to the config server. Since the default distributed lock acquisition timeout is still 20 seconds, this causes timeouts when large number of threads suddenly try to write against a database, which does not exist.

      What happens is a convoying effect on the -movePrimary distributed lock, which times out and fails writes even though the database is already created. I am able to reproduce this problem 100% using the load phase of the YCSB benchmark with 40 threads.

      In order to avoid this effect, before taking the distributed lock, we should take some form of lock manager X lock, like with the other metadata commands after which we should check the database for existence before taking the distributed lock, in order to mitigate the convoying effect.

      Attachments

        Issue Links

          Activity

            People

              janna.golden@mongodb.com Janna Golden
              kaloian.manassiev@mongodb.com Kaloian Manassiev
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: