Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-23282

Provide diagnostic log message when CSRS replset name does not match

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 4.1.1
    • Affects Version/s: None
    • Component/s: Diagnostics, Sharding
    • Labels:
      None
    • Minor Change
    • Sharding 12 (04/01/16), Sharding 2018-07-16

      The CSRS-equivalent follow-on from SERVER-15879.

      If a mongos is started with an incorrect replica set name in the --configdb parameter, there is no clear message indicating that this is the problem.

      $ /var/lib/mongodb-mms-automation/mongodb-linux-x86_64-3.3.3-dev/bin/mongos --port 36901 --configdb config/devkev-1.devkev.5431.mongodbdns.com:36800
      2016-03-22T07:05:21.239+0000 W SHARDING [main] Running a sharded cluster with fewer than 3 config servers should only be done for testing purposes and is not recommended for production.
      2016-03-22T07:05:21.243+0000 I CONTROL  [main]
      2016-03-22T07:05:21.243+0000 I CONTROL  [main] ** NOTE: This is a development version (3.3.3) of MongoDB.
      2016-03-22T07:05:21.243+0000 I CONTROL  [main] **       Not recommended for production.
      2016-03-22T07:05:21.243+0000 I CONTROL  [main]
      2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] MongoS version 3.3.3 starting: pid=22219 port=36901 64-bit host=devkev-1 (--help for usage)
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] db version v3.3.3
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] git version: 1eedf75bf5ac0098f212d931bad94bd8b54060c3
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] allocator: tcmalloc
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] modules: none
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] build environment:
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain]     distarch: x86_64
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain]     target_arch: x86_64
      2016-03-22T07:05:21.243+0000 I CONTROL  [mongosMain] options: { net: { port: 36901 }, sharding: { configDB: "config/devkev-1.devkev.5431.mongodbdns.com:36800" } }
      2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800
      2016-03-22T07:05:21.243+0000 I NETWORK  [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800
      2016-03-22T07:05:21.243+0000 I NETWORK  [ReplicaSetMonitorWatcher] starting
      2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms)
      2016-03-22T07:05:21.246+0000 W NETWORK  [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
      2016-03-22T07:05:21.246+0000 W NETWORK  [mongosMain] No primary detected for set config
      2016-03-22T07:05:21.246+0000 I NETWORK  [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
      2016-03-22T07:05:21.746+0000 W NETWORK  [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321746), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
      2016-03-22T07:05:21.746+0000 W NETWORK  [mongosMain] No primary detected for set config
      2016-03-22T07:05:21.747+0000 I NETWORK  [mongosMain] All nodes for set config are down. This has happened for 2 checks in a row. Polling will stop after 28 more failed checks
      2016-03-22T07:05:22.247+0000 W NETWORK  [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630322247), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
      2016-03-22T07:05:22.247+0000 W NETWORK  [mongosMain] No primary detected for set config
      2016-03-22T07:05:22.247+0000 I NETWORK  [mongosMain] All nodes for set config are down. This has happened for 3 checks in a row. Polling will stop after 27 more failed checks
      ...
      

      The mongos is not yet accepting connections, and continues to retry forever. This retry behaviour is legit, because it's appropriate for transient errors (eg. config servers temporarily down). However, passing the wrong replica set name to --configdb is not a transient error.

      The predominant error messages are all confusing:

      1. node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }

        Since the hostname which appears after "node:" is correct, this is confusing for users because the host clearly is a member of the replica set — it shows up in rs.status() on the CSRS, and even shows up later on this line itself in the "hosts" field of ismaster.

      2. No primary detected for set config

        This is confusing because there clearly is a primary in the set — again, as indicated in rs.status() on the CSRS, and in the "primary" field of the ismaster section on the above line.

      3. All nodes for set config are down.

        Again, this is confusing because the nodes in the set are clearly up, and the mongos is not reporting any trouble with connecting to them (apart from the also mysterious inability to locate a primary).

      In the case where the setName returned by ismaster doesn't match the replset name given in --configdb, there should be a clear message stating this.

      For example, the above logs would be much clearer if they were something like:

      Unable to find source-code formatter for language: diff. Available languages are: actionscript, ada, applescript, bash, c, c#, c++, cpp, css, erlang, go, groovy, haskell, html, java, javascript, js, json, lua, none, nyan, objc, perl, php, python, r, rainbow, ruby, scala, sh, sql, swift, visualbasic, xml, yaml
       2016-03-22T07:05:21.243+0000 I SHARDING [mongosMain] Updating config server connection string to: config/devkev-1.devkev.5431.mongodbdns.com:36800
       2016-03-22T07:05:21.243+0000 I NETWORK  [mongosMain] Starting new replica set monitor for config/devkev-1.devkev.5431.mongodbdns.com:36800
       2016-03-22T07:05:21.243+0000 I NETWORK  [ReplicaSetMonitorWatcher] starting
       2016-03-22T07:05:21.245+0000 I SHARDING [thread1] creating distributed lock ping thread for process devkev-1:36901:1458630321:439377670 (sleeping for 30000ms)
      +2016-03-22T07:05:21.246+0000 E NETWORK  [mongosMain] Error: Remote config server replica set name "configRS" does not match expected replica set name "config"
       2016-03-22T07:05:21.246+0000 W NETWORK  [mongosMain] node: devkev-1.devkev.5431.mongodbdns.com:36800 isn't a part of set: config ismaster: { hosts: [ "devkev-1.devkev.5431.mongodbdns.com:36800" ], setName: "configRS", setVersion: 1, ismaster: true, secondary: false, primary: "devkev-1.devkev.5431.mongodbdns.com:36800", me: "devkev-1.devkev.5431.mongodbdns.com:36800", electionId: ObjectId('7fffffff0000000000000001'), configsvr: 1, maxBsonObjectSize: 16777216, maxMessageSizeBytes: 48000000, maxWriteBatchSize: 1000, localTime: new Date(1458630321246), maxWireVersion: 4, minWireVersion: 0, readOnly: false, ok: 1.0 }
       2016-03-22T07:05:21.246+0000 W NETWORK  [mongosMain] No primary detected for set config
       2016-03-22T07:05:21.246+0000 I NETWORK  [mongosMain] All nodes for set config are down. This has happened for 1 checks in a row. Polling will stop after 29 more failed checks
      

            Assignee:
            kevin.pulo@mongodb.com Kevin Pulo
            Reporter:
            kevin.pulo@mongodb.com Kevin Pulo
            Votes:
            2 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: