Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-31428

Poor performance when many concurrent ops refresh sharding metadata

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.4.9, 3.6.0-rc0
    • Fix Version/s: 3.4.10, 3.6.0-rc1
    • Component/s: Sharding
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Operating System:
      ALL
    • Backport Requested:
      v3.4
    • Case:

      Description

      Consider a shard node, which just started and/or became primary and does not have any sharding metadata cached.

      If many threads running sharded operations (i.e., operations containing a non-UNSHARDED version) arrive at the same time, all these threads will get StaleConfigException and will enter the refresh code here. From these threads, only one will do the refresh from the config server, but all of them will eventually call this line, which will do nothing if the metadata is already fresh, but in the end all these threads will acquire the collection X-lock and cause stalls on an already overloaded server.

      In addition, all threads will redundantly process the new metadata.

      The complete solution to fix this would be to serialize collection refreshes on the shard, outside of the synchronization already happening through the catalog cache.

      A quick solution to the MODE_X aspect would be to add a check (under collection IS lock) just before the X lock is acquired to re-check that the version obtained from the CatalogCache is not different and skip acquiring the X-lock in this case.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                1 Vote for this issue
                Watchers:
                13 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: