ISSUE DESCRIPTION AND IMPACT
If mongos loses network contact with all nodes from the CSRS config server set (both primary and secondaries), the replica set monitor will deem this set as as 'unusable' and will stop monitoring it. As a result, all operations which need to access config server metadata will fail.
DIAGNOSIS AND AFFECTED VERSIONS
This issue is present on MongoDB 3.2.0 to 3.2.9.
Operations that require access config server metadata will begin failing with the following error:
> db.foo.find().itcount(); 2016-03-16T16:59:20.941-0400 E QUERY [thread1] Error: error: { "code" : 71, "ok" : 0, "errmsg" : "None of the hosts for replica set test-configRS could be contacted." } : _getErrorWithCode@src/mongo/shell/utils.js:25:13 DBCommandCursor@src/mongo/shell/query.js:694:1 DBQuery.prototype._exec@src/mongo/shell/query.js:118:28 DBQuery.prototype.hasNext@src/mongo/shell/query.js:281:5 DBQuery.prototype.itcount@src/mongo/shell/query.js:407:12 @(shell):1:16
REMEDIATION AND WORKAROUNDS
To resolve this issue, restart the affected mongos or mongod.
This issue has been fixed in MongoDB 3.4.0 and 3.2.10:
- MongoDB 3.4.0 contains the fix described in this ticket..
- MongoDB 3.2.10 contains the fix described by
SERVER-25516.
On versions prior to MongoDB 3.2.10, this issue can be avoided by executing the following command at runtime on all mongos and mongod nodes:
db.adminCommand( {setParameter: 1, 'replMonitorMaxFailedChecks': 2147483647} )
Please note that this parameter does not persist and must be set each time the node restarts.
Original description
If mongos loses network contact with all nodes from the CSRS config server set (both primary and secondaries), the replica set monitor will deem this set as as 'unusable' and will stop monitoring it.
From this point onward all operations which need to access some config server metadata will begin failing with the following error:
> db.foo.find().itcount(); 2016-03-16T16:59:20.941-0400 E QUERY [thread1] Error: error: { "code" : 71, "ok" : 0, "errmsg" : "None of the hosts for replica set test-configRS could be contacted." } : _getErrorWithCode@src/mongo/shell/utils.js:25:13 DBCommandCursor@src/mongo/shell/query.js:694:1 DBQuery.prototype._exec@src/mongo/shell/query.js:118:28 DBQuery.prototype.hasNext@src/mongo/shell/query.js:281:5 DBQuery.prototype.itcount@src/mongo/shell/query.js:407:12 @(shell):1:16
This includes the refresh of the list of shards, which needs to be read from the config server metadata. Therefore, currently there is no procedure to restart or retry monitoring of the config server set and the only recourse is to restart mongos.
- is duplicated by
-
SERVER-22971 Operations on some sharded collections fail with bogus error
- Closed
- is related to
-
SERVER-23345 RAII semantics for ReplicaSetMonitor
- Closed
-
SERVER-25516 Add setParameter option to 3.2 to prevent the replica set monitor from ever giving up on monitoring a set
- Closed
-
SERVER-26719 Improve logging when config server does not support CSRS
- Closed
- related to
-
SERVER-22107 Improve error message when ReplicaSetMonitor cannot connect to a replSet node in mongos
- Closed