Details
-
Question
-
Resolution: Done
-
Blocker - P1
-
None
-
None
-
None
-
None
Description
We have around 3 app servers on which we are running mongos which connect to 3 config servers.
There was following error in one of the mongos server logs:
1) ""[Balancer] caught exception while doing balance: error checking clock skew of cluster CFG1.hma.com:30000, CFG2.hma.com:30000,CFG3.hma.com:30000 :: caused by :: 13650 clock skew of the cluster CFG1.hma.com:30000, CFG2.hma.com:30000, CFG3.hma.com:30000 is too far out of bounds to allow distributed locking.""
This is due to time difference but we have ntpd service running. The difference between time of One working Mongos Server to Non-Working mOngos is around 10sec which i don't think should create this issue.
2)Mon Sep 16 08:07:47.049 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked.
Mon Sep 16 08:07:53.267 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' acquired, ts : 5236f4998003d9842486ab03
Mon Sep 16 08:07:53.372 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked. "
This is coming on one of the mongos server. - Wanted to confirm is balancing works only on one of the mongos server.
Also, db.locks.find(
{ _id : "balancer" } ).pretty() gave following output:-
{
"_id" : "balancer",
"process" : "WEB001:27017:1374748869:1804289383",
"state" : 0,
"ts" : ObjectId("5236f315abd060ee92056a41"),
"when" : ISODate("2013-09-16T12:01:25.938Z"),
"who" : "T00AWSPWEB001.HMA.COM:27000:1374748869:1804289383:Balancer:846930886",
"why" : "doing balance round"
}
So to summaries all 3 mongos(A, B, C) show different status: In A no log is generated(working fine), In B clock skew issue is coming(not working correctly), In C distributed lock issue is coming.