Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-11332

Authentication requests delayed if first config server is unresponsive

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 2.6.2, 2.7.1
    • Affects Version/s: 2.4.6
    • Component/s: Performance, Sharding
    • None
    • Environment:
      sharded cluster, 3 config servers, auth

      Issue Status as of May 14, 2014

      For MongoDB sharded clusters with authentication enabled, authentication requests on new connections can query the first config server if authentication data is not already cached. If this config server is unresponsive, there is a 30 second timeout after which the next config server is contacted. These long 30-second timeouts sometimes cause delays on new connections, manifesting as slow queries or other operations. An internal internalSCCAllowFastestAuthConfigReads mongos server parameter was added to enable reading authentication data from the first-to-respond config server.

      In authenticated environments, when the first config server becomes unresponsive (note: this is different from the config server shutting down as connections would then fail immediately) and authentication data is not cached, queries and other operations can be delayed by up to 30 seconds.

      The preferred workaround is to block the first config server using a firewall (e.g. with iptables) to make connections to it fail immediately. In this case, the second config server is contacted without the 30-second delay. If this is not possible, the internal mongos parameter internalSCCAllowFastestAuthConfigReads can be used to workaround the issue.

      All previous versions are affected by this issue.

      The fix is included in the 2.6.2 production release.

      For authentication requests (and only for those), a parameter internalSCCAllowFastestAuthConfigReads was added to allow all three config servers to be queried concurrently. To ensure consistent reads of all other metadata, all other requests use the normal mechanism of contacting the first config server, with a 30-second timeout.

      Original description

      Normal collection operations, do not touch config server.
      But other things do.
      Some examples:

      • authentication
      • splits/balancer
      • listDatabases
      • creating database
      • creating collection

      Possible Solutions:

      • send reads to all (maybe with a tiny backoff), respond from first response (maybe with threshold) (preferred)
      • blacklist (a bit ugly + racy)

        1. SERVER-11332 mongos verbose log.txt
          9 kB
        2. SERVER-11332 reproduce notes.txt
          17 kB
        3. sync_hung_cmd.js
          2 kB

            greg_10gen Greg Studer
            alex.komyagin@mongodb.com Alexander Komyagin (Inactive)
            6 Vote for this issue
            21 Start watching this issue