Details
-
Improvement
-
Resolution: Done
-
Major - P3
-
None
-
None
-
Minor Change
-
Sharding 12 (04/01/16), Sharding 2018-07-16
Description
The CSRS-equivalent follow-on from SERVER-15879.
If a mongos is started with an incorrect replica set name in the --configdb parameter, there is no clear message indicating that this is the problem.
$ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.3.3-dev/bin/mongos --port 36901 --configdb config/devkev-1.devkev.5431.mongodbdns.com:36800
|
2016-03-22T07:05:21.239+0000 W SHARDING [main] Running a sharded cluster with fewer than 3 config servers should only be done for testing purposes and is not recommended for production.
|
2016-03-22T07:05:21.243+0000 I CONTROL [main]
|
2016-03-22T07:05:21.243+0000 I CONTROL [main] ** NOTE: This is a development version (3.3.3) of MongoDB.
|
2016-03-22T07:05:21.243+0000 I CONTROL [main] ** Not recommended for production.
|
2016-03-22T07:05:21.243+0000 I CONTROL [main]
|
2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] MongoS version 3.3.3 starting: pid=22219 port=36901 64-bit host=devkev-1 (--help for usage)
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] db version v3.3.3
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] git version: 1eedf75bf5ac0098f212d931bad94bd8b54060c3
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] allocator: tcmalloc
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] modules: none
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] build environment:
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] distarch: x86_64
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] target_arch: x86_64
|
2016-03-22T07:05:21.243+0000 I CONTROL [mongosMain] options: { net: { port: 36901 }, sharding: { configDB: "config/devkev-1.devkev.5431.mongodbdns.com:36800" } }
|
2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800
|
2016-03-22T07:05:21.243+0000 I NETWORK [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800
|
2016-03-22T07:05:21.243+0000 I NETWORK [ReplicaSetMonitorWatcher] starting
|
2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms)
|
2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
|
2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] No primary detected for set config
|
2016-03-22T07:05:21.246+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
|
2016-03-22T07:05:21.746+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321746), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
|
2016-03-22T07:05:21.746+0000 W NETWORK [mongosMain] No primary detected for set config
|
2016-03-22T07:05:21.747+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 2 checks in a row. Polling will stop after 28 more failed checks
|
2016-03-22T07:05:22.247+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630322247), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
|
2016-03-22T07:05:22.247+0000 W NETWORK [mongosMain] No primary detected for set config
|
2016-03-22T07:05:22.247+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 3 checks in a row. Polling will stop after 27 more failed checks
|
...
|
The mongos is not yet accepting connections, and continues to retry forever. This retry behaviour is legit, because it's appropriate for transient errors (eg. config servers temporarily down). However, passing the wrong replica set name to --configdb is not a transient error.
The predominant error messages are all confusing:
-
node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }Since the hostname which appears after "node:" is correct, this is confusing for users because the host clearly is a member of the replica set — it shows up in rs.status() on the CSRS, and even shows up later on this line itself in the "hosts" field of ismaster.
-
No primary detected for set configThis is confusing because there clearly is a primary in the set — again, as indicated in rs.status() on the CSRS, and in the "primary" field of the ismaster section on the above line.
-
All nodes for set config are down.Again, this is confusing because the nodes in the set are clearly up, and the mongos is not reporting any trouble with connecting to them (apart from the also mysterious inability to locate a primary).
In the case where the setName returned by ismaster doesn't match the replset name given in --configdb, there should be a clear message stating this.
For example, the above logs would be much clearer if they were something like:
2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800
|
2016-03-22T07:05:21.243+0000 I NETWORK [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800
|
2016-03-22T07:05:21.243+0000 I NETWORK [ReplicaSetMonitorWatcher] starting
|
2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms)
|
+2016-03-22T07:05:21.246+0000 E NETWORK [mongosMain] Error: Remote config server replica set name "configRS" does not match expected replica set name "config"
|
2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
|
2016-03-22T07:05:21.246+0000 W NETWORK [mongosMain] No primary detected for set config
|
2016-03-22T07:05:21.246+0000 I NETWORK [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks |
Attachments
Issue Links
- related to
-
SERVER-15879 Better error reporting when the mongos configdb string does not match with cached one
-
- Closed
-