[SERVER-12145] authSchemaUpgradeStep attempts to proceed forever if config server is not upgraded yet Created: 17/Dec/13  Updated: 11/Jul/16  Resolved: 18/Dec/13

Status: Closed
Project: Core Server
Component/s: Security
Affects Version/s: None
Fix Version/s: 2.5.5

Type: Task Priority: Major - P3
Reporter: Michael O'Brien Assignee: Unassigned
Resolution: Done Votes: 0
Labels: 26qa
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Participants:

 Description   

i tried running the auth upgrade process in the situation where one of the shards is still on 2.4 to see what might happen if someone proceeds with the upgrade steps, but has forgotten (or missed) a binary upgrade of one of the mongods.

It looks like in some cases, the upgrade fails but the command still returns

{"ok":1}

so running the command in a while loop (as recommended here http://docs.mongodb.org/master/release-notes/2.6-upgrade/#upgrade-from-mongodb-2-4-user-authorization-model-to-mongodb-2-5-x-model) will make it spin forever until killed.

script to reproduce:

var st1 = new ShardingTest({shards:1, mongos:2, other:{mongosOptions:{binVersion:"2.5"}, shardOptions:{binVersion:MongoRunner.versionIterator(["2.5", "2.4"])}, configOptions:{binVersion:"2.5"}}})
print("STOPPIN BALANCER")
st1.stopBalancer()
print("STOPPED BALANCER, DOING UPGRADE")
MongoRunner.stopMongos(st1.s0)
st1.s0 = MongoRunner.runMongos({restart:st1.s0, binVersion:"2.5", upgrade:""})
 
//this will spin forever, because res.ok always returns 1, but the command actually seems to be failing 
do {
	res = st1.s0.getDB("admin").runCommand({authSchemaUpgradeStep: 1});
	print(tojson(res));
} while (res.ok && !res.done);

The logs say:

 m30999| 2013-12-17T15:02:32.748-0500 [conn1] distributed lock 'authorizationData/Michaels-MacBook-Pro.local:30999:1387310551:16807' acquired, ts : 52b0add83ffc79855ae4c201
 m30999| 2013-12-17T15:02:32.748-0500 [conn1] Auth schema upgrade erasing contents of admin.system.backup_users
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] Auth schema upgrade backing up admin.system.users into admin.system.backup_users
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] scoped connection to localhost:30000 not being returned to the pool
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] Auth schema upgrade dropping indexes from admin.system.new_users
 m30000| Tue Dec 17 15:02:32.749 [conn1014] end connection 127.0.0.1:58454 (7 connections now open)
 m30000| Tue Dec 17 15:02:32.749 [initandlisten] connection accepted from 127.0.0.1:58455 #1015 (8 connections now open)
 m30000| Tue Dec 17 15:02:32.749 [conn1015] CMD: dropIndexes admin.system.new_users
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] warning: Auth schema upgrade failed to drop indexes on admin.system.new_users (UnknownError can't drop system ns)
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] Auth schema upgrade erasing contents of admin.system.new_users
 m30999| 2013-12-17T15:02:32.749-0500 [conn1] Auth schema upgrade creating needed indexes of admin.system.new_users
 m30000| Tue Dec 17 15:02:32.749 [conn1015] build index admin.system.new_users { user: 1, db: 1 }
 m30000| Tue Dec 17 15:02:32.750 [conn1015] build index done.  scanned 0 total records. 0 secs
 m30999| 2013-12-17T15:02:32.750-0500 [conn1] Auth schema upgrade processing schema version 1 users from database admin
 m30999| 2013-12-17T15:02:32.750-0500 [conn1] scoped connection to localhost:30000 not being returned to the pool
 m30999| 2013-12-17T15:02:32.750-0500 [conn1] Auth schema upgrade processing schema version 1 users from database config
 m30000| Tue Dec 17 15:02:32.750 [conn1015] end connection 127.0.0.1:58455 (7 connections now open)
 m30000| Tue Dec 17 15:02:32.750 [initandlisten] connection accepted from 127.0.0.1:58456 #1016 (8 connections now open)
 m30999| 2013-12-17T15:02:32.750-0500 [conn1] scoped connection to localhost:30000 not being returned to the pool
 m30000| Tue Dec 17 15:02:32.751 [conn1016] end connection 127.0.0.1:58456 (7 connections now open)
 m30000| Tue Dec 17 15:02:32.751 [initandlisten] connection accepted from 127.0.0.1:58457 #1017 (8 connections now open)
 m30999| 2013-12-17T15:02:32.751-0500 [conn1] distributed lock 'authorizationData/Michaels-MacBook-Pro.local:30999:1387310551:16807' unlocked.
{ "done" : false, "ok" : 1 }

This repeats with each iteration of the while loop, ad infinitum.



 Comments   
Comment by Michael O'Brien [ 18/Dec/13 ]

Looks like this isn't happening anymore with the latest nightly build.

Comment by Michael O'Brien [ 17/Dec/13 ]

I had forgotten to add the "separateConfig:true" option to shardingtest.
So this actually happens when the config server hasn't been upgraded yet - the version of the shards themselves doesn't matter.

Comment by Michael O'Brien [ 17/Dec/13 ]

Possibly worth noting: If i run the test after reversing the order of the versions in the shardOptions, so that it reads:

shardOptions:{binVersion:MongoRunner.versionIterator(["2.4", "2.5"])}

it succeeds.

Generated at Thu Feb 08 03:27:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.