Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-9916

be smarter about config server retries in non-responsive situations

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Duplicate
    • Affects Version/s: 2.5.0
    • Fix Version/s: None
    • Component/s: Sharding
    • Labels:
      None

      Description

      In particular failure modes, retries to a failed config server can take several seconds and block queries to secondary and tertiary config servers. When possible, we should be smarter about reading from other config servers when a server is unavailable. This especially impacts authenticated clusters, since authentication data is not cached in mongos, so new authenticated connections are initially slow to respond.

      Example:
      1. First config server goes down and is unresponsive to the network, but does not reject packets.
      2. A new authenticated connection is created to mongos.
      3. Mongos tries to read from the first config server, and before the read tries to reconnect. This eventually fails, but not until the several second timeout.
      4. Mongos successfully reads from the second config server, but the response time is bad.
      5. This continues to happen for future new connections, each new connection waits for the full timeout, despite the fact that the server is still unavailable.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              greg_10gen Greg Studer
              Participants:
              Votes:
              3 Vote for this issue
              Watchers:
              8 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: