Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-15879

Better error reporting when the mongos configdb string does not match with cached one

    XMLWordPrintable

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major - P3
    • Resolution: Won't Fix
    • Affects Version/s: 2.6.0
    • Fix Version/s: None
    • Component/s: Diagnostics, Sharding
    • Labels:
    • Backwards Compatibility:
      Fully Compatible
    • Sprint:
      Sharding 2 04/24/15, Sharding 3 05/15/15, Sharding 4 06/05/15, Sharding 5 06/26/16, Sharding 6 07/17/15, Sharding 7 08/10/15, Sharding 8 08/28/15, Sharding 9 (09/18/15), Sharding A (10/09/15), Sharding B (10/30/15)

      Description

      In case the config db string specified by mongos does not match the one cached in config db, there are no errors / warning in the logs at the default log level (in MongoDB 2.6 onwards). This causes the cluster that is partially impaired not being detected so until a metadata write operation starts failing on the cluster. Following is a log snippet from default log-level with a very vague error that possibly misleads in a wrong dimension:

      21T22:00:21.460+0000 [conn2] addshard request { addShard: "testshard/examplehost" } failed: could not verify config servers were active and reachable before write
      2014-10-21T22:02:07.275+0000 [conn2] end connection 127.0.0.1:51706 (0 connections now open)
      2014-10-21T22:40:05.079+0000 [mongosMain] connection accepted from 127.0.0.1:53069 #3 (1 connection now open)
      

      Even when the impact is noted, it is not clear what is causing it. This is only noticeable when higher log level is turned on for mongos. Following are the logs when higher log level on mongos is turned on (logLevel = 2).

      2014-10-25T04:46:59.871+0000 [Balancer] calling onCreate auth for host1.cluster1.example.com:27019 (10.10.10.10)
      2014-10-25T04:46:59.872+0000 [Balancer] initializing shard connection to host1.cluster1.example.com:27019 (10.10.10.10)
      2014-10-25T04:46:59.872+0000 [Balancer] initial sharding settings : { setShardVersion: "", init: true, configdb: "host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019", serverID: ObjectId('544a936356680b7002b9a101'), authoritative: true }
      2014-10-25T04:46:59.872+0000 [Balancer] initial sharding result : { configdb: { stored: "cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019", given: "host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019" }, ok: 0.0, errmsg: "mongos specified a different config database string : stored : cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019 vs given ..." }
      2014-10-25T04:46:59.872+0000 [Balancer] User Assertion: 15907:could not initialize sharding on connection host1.cluster1.example.com:27019 (10.10.10.10) :: caused by :: mongos specified a different config database string : stored : cfg1.tmp.example.com:27019,cfg2.tmp.example.com:27019,cfg3.tmp.example.com:27019 vs given : host1.cluster1.example.com:27019,host2.cluster1.example.com:27019,host3.cluster1.example.com:27019
      2014-10-25T04:46:59.872+0000 [Balancer] creating new connection to:host2.cluster1.example.com:27019
      2014-10-25T04:46:59.872+0000 [ConnectBG] BackgroundJob starting: ConnectBG
      2014-10-25T04:46:59.873+0000 [Balancer] connected to server host2.cluster1.example.com:27019 (10.10.10.11)
      

      Since, this impacts the working of a cluster, the error / assertion should be logged at the default log level.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              kevin.pulo Kevin Pulo
              Reporter:
              anil.kumar Anil Kumar
              Participants:
              Votes:
              10 Vote for this issue
              Watchers:
              9 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: