Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15879

Better error reporting when the mongos configdb string does not match with cached one

    • Type: Icon: Improvement Improvement
    • Resolution: Won't Fix
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: 2.6.0
    • Component/s: Diagnostics, Sharding
    • Labels:
    • Fully Compatible
    • Sharding 2 04/24/15, Sharding 3 05/15/15, Sharding 4 06/05/15, Sharding 5 06/26/16, Sharding 6 07/17/15, Sharding 7 08/10/15, Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding A (10/09/15), Sharding B (10/30/15)

      In case the config db string specified by mongos does not match the one cached in config db, there are no errors / warning in the logs at the default log level (in MongoDB 2.6 onwards). This causes the cluster that is partially impaired not being detected so until a metadata write operation starts failing on the cluster. Following is a log snippet from default log-level with a very vague error that possibly misleads in a wrong dimension:

      21T22:00:21.460+0000 [conn2] addshard request { addShard: "testshard/examplehost" } failed: could not verify config servers were active and reachable before write
      2014-10-21T22:02:07.275+0000 [conn2] end connection 127.0.0.1:51706 (0 connections now open)
      2014-10-21T22:40:05.079+0000 [mongosMain] connection accepted from 127.0.0.1:53069 #3 (1 connection now open)
      

      Even when the impact is noted, it is not clear what is causing it. This is only noticeable when higher log level is turned on for mongos. Following are the logs when higher log level on mongos is turned on (logLevel = 2).

      2014-10-25T04:46:59.871+0000 [Balancer] calling onCreate auth for host1.cluster1.example.com:27019 (10.10.10.10)
      2014-10-25T04:46:59.872+0000 [Balancer] initializing shard connection to host1.cluster1.example.com:27019 (10.10.10.10)
      2014-10-25T04:46:59.872+0000 [Balancer] initial sharding settings : { setShardVersion: "", init: true, configdb: "host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019", serverID: ObjectId('544a936356680b7002b9a101'), authoritative: true }
      2014-10-25T04:46:59.872+0000 [Balancer] initial sharding result : { configdb: { stored: "cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019", given: "host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019" }, ok: 0.0, errmsg: "mongos specified a different config database string : stored : cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019 vs given ..." }
      2014-10-25T04:46:59.872+0000 [Balancer] User Assertion: 15907:could not initialize sharding on connection host1.cluster1.example.com:27019 (10.10.10.10) :: caused by :: mongos specified a different config database string : stored : cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019 vs given : host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019
      2014-10-25T04:46:59.872+0000 [Balancer] creating new connection to:host2.cluster1.example.com:27019
      2014-10-25T04:46:59.872+0000 [ConnectBG] BackgroundJob starting: ConnectBG
      2014-10-25T04:46:59.873+0000 [Balancer] connected to server host2.cluster1.example.com:27019 (10.10.10.11)
      

      Since, this impacts the working of a cluster, the error / assertion should be logged at the default log level.

            Assignee:
            kevin.pulo@mongodb.com Kevin Pulo
            Reporter:
            anil.kumar Anil Kumar
            Votes:
            10 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: