[SERVER-9036] Config server meta-data upgrade from 2.2.3 to 2.4.0 fails with "could not create new collection" Created: 20/Mar/13  Updated: 10/Dec/14  Resolved: 27/Mar/13

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 2.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Michael Dickey Assignee: Greg Studer
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

CentOS 6/64-bit. Three virtual machines (mongo-a1, mongo-a2, mongo-a3) each running a shard instance, config server instance, and mongos instance.


Issue Links:
Related
is related to SERVER-9076 Duplicate configdb entries should not... Closed
Operating System: Linux
Steps To Reproduce:

Start with a 2.2.3 mongodb cluster. Try running "mongos --upgrade" using a 2.4.0 version of mongos.

Participants:

 Description   

I'm trying to upgrade a 2.2.3 cluster to 2.4.0 (final) following the "Upgrade a Sharded Cluster from MongoDB 2.2 to MongoDB 2.4" instructions here: http://docs.mongodb.org/manual/release-notes/2.4-upgrade/

It's failing at step #4 of the "Meta-Data Upgrade procedure." I'm running "mongos --upgrade --configdb mongo-a1:27019,mongo-a3:27019,mongo-a3:2701" with a 2.4.0 version of mongos and it gives me the following:

Wed Mar 20 18:41:25.135 [mongosMain] MongoS version 2.4.0 starting: pid=8800 port=27017 64-bit host=mgmt.test.cloudmeter.com (--help for usage)
Wed Mar 20 18:41:25.135 [mongosMain] git version: ce2d666c04b4a80af58e8bbb3388b0680e8cfeb6
Wed Mar 20 18:41:25.135 [mongosMain] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49
Wed Mar 20 18:41:25.135 [mongosMain] options: { configdb: "mongo-a1:27019,mongo-a3:27019,mongo-a3:27019", upgrade: true }
Wed Mar 20 18:41:25.143 [mongosMain] SyncClusterConnection connecting to [mongo-a1:27019]
Wed Mar 20 18:41:25.144 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.144 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.149 [mongosMain] scoped connection to mongo-a1:27019,mongo-a3:27019,mongo-a3:27019 not being returned to the pool
Wed Mar 20 18:41:25.149 [mongosMain] SyncClusterConnection connecting to [mongo-a1:27019]
Wed Mar 20 18:41:25.150 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.150 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.153 [LockPinger] creating distributed lock ping thread for mongo-a1:27019,mongo-a3:27019,mongo-a3:27019 and process mgmt.test.cloudmeter.com:27017:1363804885:1804289383 (sleeping for 30000ms)
Wed Mar 20 18:41:25.153 [LockPinger] SyncClusterConnection connecting to [mongo-a1:27019]
Wed Mar 20 18:41:25.155 [LockPinger] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.156 [LockPinger] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.166 [mongosMain] warning: distributed lock 'configUpgrade/mgmt.test.cloudmeter.com:27017:1363804885:1804289383 did not propagate properly. :: caused by :: 8017 update not consistent  ns: config.locks query: { _id: "configUpgrade", state: 0, ts: ObjectId('5149fdc7f2463a08200cb8bd') } update: { $set: { state: 1, who: "mgmt.test.cloudmeter.com:27017:1363804885:1804289383:mongosMain:846930886", process: "mgmt.test.cloudmeter.com:27017:1363804885:1804289383", when: new Date(1363804885155), why: "upgrading config database to new format v4", ts: ObjectId('514a02d53fcf004f1fda73a8') } } gle1: { updatedExisting: true, n: 1, connectionId: 110, waited: 1, err: null, ok: 1.0 } gle2: { updatedExisting: false, n: 0, connectionId: 175, waited: 2, err: null, ok: 1.0 }
Wed Mar 20 18:41:25.167 [mongosMain] lock update won, completing lock propagation for 'configUpgrade/mgmt.test.cloudmeter.com:27017:1363804885:1804289383'
Wed Mar 20 18:41:25.194 [mongosMain] distributed lock 'configUpgrade/mgmt.test.cloudmeter.com:27017:1363804885:1804289383' acquired, ts : 514a02d53fcf004f1fda73a8
Wed Mar 20 18:41:25.195 [mongosMain] starting upgrade of config server from v3 to v4
Wed Mar 20 18:41:25.195 [mongosMain] starting next upgrade step from v3 to v4
Wed Mar 20 18:41:25.195 [mongosMain] about to log new metadata event: { _id: "mgmt.test.cloudmeter.com-2013-03-20T18:41:25-514a02d53fcf004f1fda73a9", server: "mgmt.test.cloudmeter.com", clientAddr: "N/A", time: new Date(1363804885195), what: "starting upgrade of config database", ns: "config.version", details: { from: 3, to: 4 } }
Wed Mar 20 18:41:25.417 [mongosMain] scoped connection to mongo-a1:27019,mongo-a3:27019,mongo-a3:27019 not being returned to the pool
Wed Mar 20 18:41:25.417 [mongosMain] warning: could not cleanup previous upgrade state :: caused by :: could not drop collections during cleanup of upgrade 5149fdc7f2463a08200cb8bf :: caused by :: 13105 write $cmd failed on a node: { "errmsg" : "ns not found", "ok" : 0 } mongo-a3:27019 ns: config.$cmd cmd: { drop: "collections-upgrade-5149fdc7f2463a08200cb8bf" }
Wed Mar 20 18:41:25.417 [mongosMain] SyncClusterConnection connecting to [mongo-a1:27019]
Wed Mar 20 18:41:25.417 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.418 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.420 [mongosMain] checking that version of host mongo-a1:27018 is compatible with 2.2
Wed Mar 20 18:41:25.421 [mongosMain] checking that version of host mongo-a2:27018 is compatible with 2.2
Wed Mar 20 18:41:25.426 [mongosMain] checking that version of host mongo-a3:27018 is compatible with 2.2
Wed Mar 20 18:41:25.522 [mongosMain] acquiring locks for 0 sharded collections...
Wed Mar 20 18:41:25.522 [mongosMain] copying collection and chunk metadata to working and backup collections...
Wed Mar 20 18:41:25.591 [mongosMain] scoped connection to mongo-a1:27019,mongo-a3:27019,mongo-a3:27019 not being returned to the pool
Wed Mar 20 18:41:25.591 [mongosMain] SyncClusterConnection connecting to [mongo-a1:27019]
Wed Mar 20 18:41:25.591 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.592 [mongosMain] SyncClusterConnection connecting to [mongo-a3:27019]
Wed Mar 20 18:41:25.694 [mongosMain] distributed lock 'configUpgrade/mgmt.test.cloudmeter.com:27017:1363804885:1804289383' unlocked. 
Wed Mar 20 18:41:25.694 [mongosMain] ERROR: error upgrading config database to v4 :: caused by :: error upgrading config database from v3 to v4 :: caused by :: could not copy config.collections to config.collections-upgrade-514a02d53fcf004f1fda73aa :: caused by :: could not create new collection :: caused by :: 13105 write $cmd failed on a node: { "errmsg" : "collection already exists", "ok" : 0 } mongo-a3:27019 ns: config.$cmd cmd: { create: "collections-upgrade-514a02d53fcf004f1fda73aa" }

Running a 2.2.3 mongos shows me the following:

[root@mongo-a1 ~]# mongo
MongoDB shell version: 2.2.3
connecting to: test
mongos> use config
switched to db config
mongos> show collections
changelog
chunks
collections-upgrade-514a02d53fcf004f1fda73aa
databases
lockpings
locks
mongos
settings
shards
system.indexes
version

So it actually appears to be creating that collection; it just doesn't seem to realize it.



 Comments   
Comment by Michael Dickey [ 22/Mar/13 ]

Yes, this was primarily caused by user error. My mongos line had a typo specifying "mongo-a3" twice instead of "mongo-a2" which caused my config databases to get out of sync. Thanks!

Comment by Greg Studer [ 22/Mar/13 ]

It seems like SUPPORT-507 has resolved your issue? Also opened SERVER-9076 to make this harder in the future.

Comment by Michael Dickey [ 21/Mar/13 ]

Uploaded to SUPPORT-507.

Also I accidentally created a duplicate: SERVER-9044. Please delete this one for me.

Comment by Andre de Frere [ 21/Mar/13 ]

Hi Michael,

We would like to investigate the config database from your environment. The SERVER Jira project is public, and therefore may not be the best location to upload a dump of your config database (made with mongodump). Are you able to open a ticket in the SUPPORT project and attach a dump of your config database? If you can link the SUPPORT issue to this one we can continue investigation.

You can create a ticket in SUPPORT (Community Private) at this link.

Regards,
André

Generated at Thu Feb 08 03:19:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.