Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23192

mongos and shards will become unusable if contact is lost with all CSRS config server nodes for more than 30 consecutive failed attempts to contact

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Fixed
    • Affects Version/s: 3.2.4
    • Fix Version/s: 3.3.11
    • Component/s: Sharding
    • Labels:
      None
    • Backwards Compatibility:
      Minor Change
    • Operating System:
      ALL
    • Steps To Reproduce:
      Hide

      See the comment in jstests/sharding/startup_with_all_configs_down.js.

      Show
      See the comment in jstests/sharding/startup_with_all_configs_down.js.
    • Sprint:
      Sharding 16 (06/24/16), Sharding 18 (08/05/16)
    • Case:
    • Linked BF Score:
      0

      Description

      Issue Status as of Oct 07, 2016

      ISSUE DESCRIPTION AND IMPACT
      If mongos loses network contact with all nodes from the CSRS config server set (both primary and secondaries), the replica set monitor will deem this set as as 'unusable' and will stop monitoring it. As a result, all operations which need to access config server metadata will fail.

      DIAGNOSIS AND AFFECTED VERSIONS
      This issue is present on MongoDB 3.2.0 to 3.2.9.

      Operations that require access config server metadata will begin failing with the following error:

      > db.foo.find().itcount();
       
      2016-03-16T16:59:20.941-0400 E QUERY    [thread1] Error: error: {
              "code" : 71,
              "ok" : 0,
              "errmsg" : "None of the hosts for replica set test-configRS could be contacted."
      } :
      _getErrorWithCode@src/mongo/shell/utils.js:25:13
      DBCommandCursor@src/mongo/shell/query.js:694:1
      DBQuery.prototype._exec@src/mongo/shell/query.js:118:28
      DBQuery.prototype.hasNext@src/mongo/shell/query.js:281:5
      DBQuery.prototype.itcount@src/mongo/shell/query.js:407:12
      @(shell):1:16
      

      REMEDIATION AND WORKAROUNDS
      To resolve this issue, restart the affected mongos or mongod.

      This issue has been fixed in MongoDB 3.4.0 and 3.2.10:

      • MongoDB 3.4.0 contains the fix described in this ticket..
      • MongoDB 3.2.10 contains the fix described by SERVER-25516.

      On versions prior to MongoDB 3.2.10, this issue can be avoided by executing the following command at runtime on all mongos and mongod nodes:

      db.adminCommand( {setParameter: 1, 'replMonitorMaxFailedChecks': 2147483647} )
      

      Please note that this parameter does not persist and must be set each time the node restarts.

      Original description

      If mongos loses network contact with all nodes from the CSRS config server set (both primary and secondaries), the replica set monitor will deem this set as as 'unusable' and will stop monitoring it.

      From this point onward all operations which need to access some config server metadata will begin failing with the following error:

      > db.foo.find().itcount();
       
      2016-03-16T16:59:20.941-0400 E QUERY    [thread1] Error: error: {
              "code" : 71,
              "ok" : 0,
              "errmsg" : "None of the hosts for replica set test-configRS could be contacted."
      } :
      _getErrorWithCode@src/mongo/shell/utils.js:25:13
      DBCommandCursor@src/mongo/shell/query.js:694:1
      DBQuery.prototype._exec@src/mongo/shell/query.js:118:28
      DBQuery.prototype.hasNext@src/mongo/shell/query.js:281:5
      DBQuery.prototype.itcount@src/mongo/shell/query.js:407:12
      @(shell):1:16
      

      This includes the refresh of the list of shards, which needs to be read from the config server metadata. Therefore, currently there is no procedure to restart or retry monitoring of the config server set and the only recourse is to restart mongos.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              misha.tyulenev Misha Tyulenev
              Reporter:
              kaloian.manassiev Kaloian Manassiev
              Participants:
              Votes:
              5 Vote for this issue
              Watchers:
              32 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: