Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-16237

Don't check the shard version if the primary server is down

    XMLWordPrintable

    Details

    • Type: Task
    • Status: Closed
    • Priority: Critical - P2
    • Resolution: Fixed
    • Affects Version/s: 2.7.8
    • Fix Version/s: 2.6.7, 2.8.0-rc4
    • Component/s: Sharding, Stability
    • Labels:
      None
    • Backwards Compatibility:
      Fully Compatible
    • Backport Completed:

      Description

      In the source code we have a check to bypass the shard version check when the primary connection is in the bad state. However, it appears to be incomplete.

      The state of the connection will not be recognized as bad until the OS figures it out (e.g. with keepalive), or until we try to use it. Hence it's possible to encounter a scenario when primary is unreachable, but the connection is still recognized as valid, so we will try to check the shard version, which triggers the replicasetrefresh, and we will encounter the 5 sec timeout there on each and every query. Plus we can also get delayed by other threads who are doing the refresh, as it's a serialized action.

      If we already know that the primary server is down (i.e. the monitor knows), we probably can avoid checking the shard version if possible.

        Attachments

          Issue Links

            Activity

              People

              • Votes:
                0 Vote for this issue
                Watchers:
                12 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: