Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10780

Clock skew and balancing in MONGOS

    XMLWordPrintableJSON

Details

    • Icon: Question Question
    • Resolution: Done
    • Icon: Blocker - P1 Blocker - P1
    • None
    • None
    • None
    • None

    Description

      We have around 3 app servers on which we are running mongos which connect to 3 config servers.

      There was following error in one of the mongos server logs:

      1) ""[Balancer] caught exception while doing balance: error checking clock skew of cluster CFG1.hma.com:30000, CFG2.hma.com:30000,CFG3.hma.com:30000 :: caused by :: 13650 clock skew of the cluster CFG1.hma.com:30000, CFG2.hma.com:30000, CFG3.hma.com:30000 is too far out of bounds to allow distributed locking.""

      This is due to time difference but we have ntpd service running. The difference between time of One working Mongos Server to Non-Working mOngos is around 10sec which i don't think should create this issue.

      2)Mon Sep 16 08:07:47.049 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked.
      Mon Sep 16 08:07:53.267 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' acquired, ts : 5236f4998003d9842486ab03
      Mon Sep 16 08:07:53.372 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked. "

      This is coming on one of the mongos server. - Wanted to confirm is balancing works only on one of the mongos server.

      Also, db.locks.find(

      { _id : "balancer" }

      ).pretty() gave following output:-
      {
      "_id" : "balancer",
      "process" : "WEB001:27017:1374748869:1804289383",
      "state" : 0,
      "ts" : ObjectId("5236f315abd060ee92056a41"),
      "when" : ISODate("2013-09-16T12:01:25.938Z"),
      "who" : "T00AWSPWEB001.HMA.COM:27000:1374748869:1804289383:Balancer:846930886",
      "why" : "doing balance round"
      }

      So to summaries all 3 mongos(A, B, C) show different status: In A no log is generated(working fine), In B clock skew issue is coming(not working correctly), In C distributed lock issue is coming.

      Attachments

        Activity

          People

            Unassigned Unassigned
            somit Somit Srivastava
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: