Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31428

Poor performance when many concurrent ops refresh sharding metadata

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major - P3
    • Resolution: Fixed
    • 3.4.9, 3.6.0-rc0
    • 3.4.10, 3.6.0-rc1
    • Sharding
    • Fully Compatible
    • ALL
    • v3.4

    Description

      Consider a shard node, which just started and/or became primary and does not have any sharding metadata cached.

      If many threads running sharded operations (i.e., operations containing a non-UNSHARDED version) arrive at the same time, all these threads will get StaleConfigException and will enter the refresh code here. From these threads, only one will do the refresh from the config server, but all of them will eventually call this line, which will do nothing if the metadata is already fresh, but in the end all these threads will acquire the collection X-lock and cause stalls on an already overloaded server.

      In addition, all threads will redundantly process the new metadata.

      The complete solution to fix this would be to serialize collection refreshes on the shard, outside of the synchronization already happening through the catalog cache.

      A quick solution to the MODE_X aspect would be to add a check (under collection IS lock) just before the X lock is acquired to re-check that the version obtained from the CatalogCache is not different and skip acquiring the X-lock in this case.

      Attachments

        Issue Links

          Activity

            People

              kevin.pulo@mongodb.com Kevin Pulo
              kaloian.manassiev@mongodb.com Kaloian Manassiev
              Votes:
              1 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: