[SERVER-10780] Clock skew and balancing in MONGOS Created: 16/Sep/13  Updated: 10/Dec/14  Resolved: 18/Mar/14

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Question Priority: Blocker - P1
Reporter: Somit Srivastava Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

We have around 3 app servers on which we are running mongos which connect to 3 config servers.

There was following error in one of the mongos server logs:

1) ""[Balancer] caught exception while doing balance: error checking clock skew of cluster CFG1.hma.com:30000, CFG2.hma.com:30000,CFG3.hma.com:30000 :: caused by :: 13650 clock skew of the cluster CFG1.hma.com:30000, CFG2.hma.com:30000, CFG3.hma.com:30000 is too far out of bounds to allow distributed locking.""

This is due to time difference but we have ntpd service running. The difference between time of One working Mongos Server to Non-Working mOngos is around 10sec which i don't think should create this issue.

2)Mon Sep 16 08:07:47.049 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked.
Mon Sep 16 08:07:53.267 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' acquired, ts : 5236f4998003d9842486ab03
Mon Sep 16 08:07:53.372 [Balancer] distributed lock 'balancer/WEB002:27017:1374748868:1804289383' unlocked. "

This is coming on one of the mongos server. - Wanted to confirm is balancing works only on one of the mongos server.

Also, db.locks.find(

{ _id : "balancer" }

).pretty() gave following output:-
{
"_id" : "balancer",
"process" : "WEB001:27017:1374748869:1804289383",
"state" : 0,
"ts" : ObjectId("5236f315abd060ee92056a41"),
"when" : ISODate("2013-09-16T12:01:25.938Z"),
"who" : "T00AWSPWEB001.HMA.COM:27000:1374748869:1804289383:Balancer:846930886",
"why" : "doing balance round"
}

So to summaries all 3 mongos(A, B, C) show different status: In A no log is generated(working fine), In B clock skew issue is coming(not working correctly), In C distributed lock issue is coming.



 Comments   
Comment by Stennie Steneker (Inactive) [ 18/Mar/14 ]

Hi Somit,

Please be advised I'm closing this issue due to inactivity.

Large amounts of clock skew can cause unexpected issues for many programs, particularly if adjustments cause servers to skip back in time. MongoDB has some tolerance for clock skew, but as per the log message you encountered there are sanity checks to keep the skew within reason.

If you do encounter a warning on clock skew, the appropriate fix would be to synchronise the server times and ensure ntpd is working correctly.

Regards,
Stephen

Comment by Eliot Horowitz (Inactive) [ 30/Nov/13 ]

Is this still an issue?
Can you send times on all servers and mongodump of config server if so.

Generated at Thu Feb 08 03:24:03 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.