[SERVER-17524] MongoDB sharding problem Created: 10/Mar/15  Updated: 15/May/15  Resolved: 15/May/15

Status: Closed
Project: Core Server
Component/s: Internal Code, Sharding
Affects Version/s: 2.2.7
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Girish Bhat Assignee: Randolph Tan
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:

 Description   

I have a mongoDB cluster of 3 replication sets

Getting this error

id_ObjectId('54f9ed416aad853b66d0c21f')", configdb: "172.31.12.107:27019,172.31.12.93:27019,172.31.12.43:27019" } result: { ok: 0.0, errmsg: "Error locking distributed lock for split. :: caused by :: 13651 error checking clock skew of cluster 172.31.12.107:27019,172.31.12.93:27019,172.31.12..

Ntp time on these servers are all synced.

output date of all servers

mongo-s3:
Tue Mar 10 18:47:41 IST 2015
mongo-s2:
Tue Mar 10 18:47:41 IST 2015
mongo-rt1:
Tue Mar 10 18:47:41 IST 2015
mongo-p3:
Tue Mar 10 18:47:41 IST 2015
mongo-s1:
Tue Mar 10 18:47:41 IST 2015
mongo-p1:
Tue Mar 10 18:47:41 IST 2015
mongo-p2:
Tue Mar 10 18:47:41 IST 2015

I restarted the cluster keeping ntp up2date , but still same error.
Please may I know what could be the problem ?



 Comments   
Comment by Ramon Fernandez Marina [ 15/May/15 ]

Looks like the issue went away so we're resolving this ticket. If the issue reappears please feel free to reopen.

Comment by Girish Bhat [ 11/Mar/15 ]

Hi ,

When I posted the logs it was not working, I had the same time skew error.
Just few minutes back when I checked these errors are not at all there. Strange.

Comment by Randolph Tan [ 11/Mar/15 ]

Hi,

I don't see the errors from the mongod logs you posted (both logs show successful distributed lock acquisition). Are these the log level 1 logs when the skew exception happened?

Thanks!

Comment by Girish Bhat [ 11/Mar/15 ]

Hi ,

Added logs from rs0 primary replication set.
Only added for "conn159"
http://pastebin.com/QTNXgMdv

For "conn202"
http://pastebin.com/xwXQAZmi

FYI, there are 4 shards in the cluster. "rs0" has data and rest of won't . Enabled shards for collection in rs0.
Please let me know if you want more logs.

Comment by Randolph Tan [ 10/Mar/15 ]

Sorry, I meant to ask for the more verbose log on the primaries. And in particular, the primary where the split was sent to. For example, in the case of the paste bin logs, the verbose logs from the primary of rs0 when the error occurred.

Thanks!

Comment by Girish Bhat [ 10/Mar/15 ]


FYI, Also hwclock is in sync.

Comment by Girish Bhat [ 10/Mar/15 ]

Hi Randolph Tan ,

The time is same for all config servers. please look into it.

OS : CentOS 7.5 (Hosted on Amazon EC2)

mongo-s2 ~]$ echo "db.serverStatus()" | mongo --port 27019 | grep localTime
	"localTime" : ISODate("2015-03-10T16:08:11.772Z")
 
 
-mongo-s1 ~]$ echo "db.serverStatus()" | mongo --port 27019 | grep localTime
	"localTime" : ISODate("2015-03-10T16:08:11.788Z"),
 
mongo-s3 ~]$ echo "db.serverStatus()" | mongo --port 27019 | grep localTime
	"localTime" : ISODate("2015-03-10T16:08:11.821Z"),

Logs pasted below

http://pastebin.com/qrJuPMp2

Comment by Randolph Tan [ 10/Mar/15 ]

What platform and OS are running this on? The clock skew checking code uses the localtime from the serverStatus command from each of the config servers for this check. Can you also try increasing the log level to 1?

Thanks!

Generated at Thu Feb 08 03:44:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.