[SERVER-16010] monogs waited XXs for distributed lock configUpgrade for upgrading config database to new format v5 Created: 07/Nov/14  Updated: 04/Dec/14  Resolved: 04/Dec/14

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 2.6.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Pierre Lemordant Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Steps To Reproduce:

follow documentation "deploy a sharded cluster" : http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

Participants:

 Description   

I'm currently trying to deploy a "mongod 2.6.5" sharded cluster on centOS 7 VMs where firewalls are disabled and time synchronized with ntp.
I'm following the "Deploy a sharded Cluster" tutorial from : http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/

Starting config servers

On 3 VM, I run the following command (as indicated in the doc)

mongod --configsvr --dbpath /data/configdb --port 27019

They started correctly and I can observe following output logs

2014-11-06T12:25:46.604+0100 [initandlisten] waiting for connections on port 27019

Starting mongos instance

On another VM, I run the following command

mongos --configdb 192.168.8.75:27019,192.168.8.82:27019,192.168.8.62:27019 --verbose --logpath /var/log/mongodb/mongos.log --logappend

Of course the provided IP are the (previously started) config server ones.

I can observe the mongos could establish the connection to the config server as shown below

2014-11-06T15:09:11.851+0100 ***** SERVER RESTARTED *****
2014-11-06T15:09:11.852+0100 [mongosMain] MongoS version 2.6.5 starting: pid=11544 port=27017 64-bit host=192.168.8.86 (--help for usage)
2014-11-06T15:09:11.852+0100 [mongosMain] db version v2.6.5
2014-11-06T15:09:11.852+0100 [mongosMain] git version: e99d4fcb4279c0279796f237aa92fe3b64560bf6
2014-11-06T15:09:11.852+0100 [mongosMain] build info: Linux build8.nj1.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2014-11-06T15:09:11.852+0100 [mongosMain] allocator: tcmalloc
2014-11-06T15:09:11.852+0100 [mongosMain] options: { sharding: { configDB: "192.168.8.75:27019,192.168.8.82:27019,192.168.8.62:27019" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongos.log", verbosity: 1 }, upgrade: true }
2014-11-06T15:09:11.852+0100 [mongosMain]  config string : 192.168.8.75:27019,192.168.8.82:27019,192.168.8.62:27019
2014-11-06T15:09:11.852+0100 [mongosMain] creating new connection to:192.168.8.75:27019
2014-11-06T15:09:11.853+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.853+0100 [mongosMain] connected to server 192.168.8.75:27019 (192.168.8.75)
2014-11-06T15:09:11.853+0100 [mongosMain] connected connection!
2014-11-06T15:09:11.854+0100 [mongosMain] creating new connection to:192.168.8.82:27019
2014-11-06T15:09:11.854+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.854+0100 [mongosMain] connected to server 192.168.8.82:27019 (192.168.8.82)
2014-11-06T15:09:11.854+0100 [mongosMain] connected connection!
2014-11-06T15:09:11.855+0100 [mongosMain] creating new connection to:192.168.8.62:27019
2014-11-06T15:09:11.855+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.856+0100 [mongosMain] connected to server 192.168.8.62:27019 (192.168.8.62)
2014-11-06T15:09:11.856+0100 [mongosMain] connected connection!
2014-11-06T15:09:11.857+0100 [mongosMain] SyncClusterConnection connecting to [192.168.8.75:27019]
2014-11-06T15:09:11.857+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.858+0100 [mongosMain] connected to server 192.168.8.75:27019 (192.168.8.75)
2014-11-06T15:09:11.858+0100 [mongosMain] SyncClusterConnection connecting to [192.168.8.82:27019]
2014-11-06T15:09:11.858+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.858+0100 [mongosMain] connected to server 192.168.8.82:27019 (192.168.8.82)
2014-11-06T15:09:11.858+0100 [mongosMain] SyncClusterConnection connecting to [192.168.8.62:27019]
2014-11-06T15:09:11.858+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.859+0100 [mongosMain] connected to server 192.168.8.62:27019 (192.168.8.62)
2014-11-06T15:09:11.968+0100 [mongosMain] scoped connection to 192.168.8.75:27019,192.168.8.82:27019,192.168.8.62:27019 not being returned to the pool
2014-11-06T15:09:11.969+0100 [mongosMain] created new distributed lock for configUpgrade on 192.168.8.75:27019,192.168.8.82:27019,192.168.8.62:27019 ( lock timeout : 900000, ping interval : 30000, process : 0 )
2014-11-06T15:09:11.969+0100 [mongosMain] creating new connection to:192.168.8.75:27019
2014-11-06T15:09:11.970+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.970+0100 [mongosMain] connected to server 192.168.8.75:27019 (192.168.8.75)
2014-11-06T15:09:11.970+0100 [mongosMain] connected connection!
2014-11-06T15:09:11.972+0100 [mongosMain] creating new connection to:192.168.8.82:27019
2014-11-06T15:09:11.972+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.972+0100 [mongosMain] connected to server 192.168.8.82:27019 (192.168.8.82)
2014-11-06T15:09:11.972+0100 [mongosMain] connected connection!
2014-11-06T15:09:11.974+0100 [mongosMain] creating new connection to:192.168.8.62:27019
2014-11-06T15:09:11.974+0100 [ConnectBG] BackgroundJob starting: ConnectBG
2014-11-06T15:09:11.975+0100 [mongosMain] connected to server 192.168.8.62:27019 (192.168.8.62)
2014-11-06T15:09:11.975+0100 [mongosMain] connected connection!
2014-11-06T15:09:23.115+0100 [mongosMain] waited 11s for distributed lock configUpgrade for upgrading config database to new format v5
2014-11-06T15:09:34.244+0100 [mongosMain] waited 22s for distributed lock configUpgrade for upgrading config database to new format v5
2014-11-06T15:09:45.373+0100 [mongosMain] waited 33s for distributed lock configUpgrade for upgrading config database to new format v5
2014-11-06T15:09:56.503+0100 [mongosMain] waited 44s for distributed lock configUpgrade for upgrading config database to new format v5
2014-11-06T15:10:07.632+0100 [mongosMain] waited 55s for distributed lock configUpgrade for upgrading config database to new format v5

But as you can see the log says that the mongos is waiting "for distributed lock configUpgrade ..." until it reaches 1200s and stops.

These are the logs from a config server (there are similar logs for each config server) when i start the mongos instance,
It seems the connection succeed :

2014-11-06T12:27:02.944+0100 [initandlisten] connection accepted from 192.168.8.80:56474 #1 (1 connection now open)
2014-11-06T12:27:02.948+0100 [initandlisten] connection accepted from 192.168.8.80:56477 #2 (2 connections now open)
2014-11-06T12:27:02.994+0100 [conn2] end connection 192.168.8.80:56477 (1 connection now open)
2014-11-06T12:27:03.000+0100 [initandlisten] connection accepted from 192.168.8.80:56480 #3 (2 connections now open)
2014-11-06T12:31:46.608+0100 [clientcursormon] mem (MB) res:36 virt:371
2014-11-06T12:31:46.608+0100 [clientcursormon]  mapped (incl journal view):64
2014-11-06T12:31:46.608+0100 [clientcursormon]  connections:2
2014-11-06T12:36:46.645+0100 [clientcursormon] mem (MB) res:36 virt:371
2014-11-06T12:36:46.645+0100 [clientcursormon]  mapped (incl journal view):64
2014-11-06T12:36:46.645+0100 [clientcursormon]  connections:2

As I followed each step of the doc [1], please could you tell me what I missed here ?

[1] http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/



 Comments   
Comment by Ramon Fernandez Marina [ 04/Dec/14 ]

Hi pierre.lemordant_mongodb@fastconnect.fr, thanks for letting us know. Glad to hear your cluster is up and running.

Comment by FastConnect Team [ 04/Dec/14 ]

Hi Ramon,

Thank you for your response.
It was indeed a misconfiguration. Now the three config servers are weel sync, it works fine.

Regards,

Pierre

Comment by Ramon Fernandez Marina [ 03/Dec/14 ]

pierre.lemordant_mongodb@fastconnect.fr, apologies for the late reply. The only time I've seen the "waited XXs for distributed lock configUpgrade for upgrading config database to new format v5" message is when the clocks on the config servers were too far apart.

You do mention using ntp, so I'd like to rule out a possible glitch or misconfiguration (or an issue with the virtualization software). Can you please check the date on all three config servers? If they're in sync, can you please post the full logs of all three config servers?

Generated at Thu Feb 08 03:39:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.