-
Type: Question
-
Resolution: Done
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.2.21
-
Component/s: Sharding
-
None
-
Server Triage
Context: I'm on the process of upgrading a sharded cluster from 3.2 to 3.4. Before that I need to convert the config servers from SCCC to a replica set (CSRS).[1] In order to do that I need first to stop de balancer.[2]
We I try to stop de balancer I get this:
configsvr> Waiting for active host 9e6d128f5e63:27017 to recognize new settings... (ping : Wed Mar 31 2021 13:27:04 GMT+0000 (UTC)) Waited for active ping to change for host 9e6d128f5e63:27017, a migration may be in progress or the host may be down. Waiting for the balancer lock... assert.soon failed, msg:Waited too long for lock balancer to unlock doassert@src/mongo/shell/assert.js:18:14 assert.soon@src/mongo/shell/assert.js:202:13 sh.waitForDLock@src/mongo/shell/utils_sh.js:198:1 sh.waitForBalancerOff@src/mongo/shell/utils_sh.js:264:9 sh.waitForBalancer@src/mongo/shell/utils_sh.js:294:9 sh.stopBalancer@src/mongo/shell/utils_sh.js:161:5 @(shell):1:1Balancer still may be active, you must manually verify this is not the case using the config.changelog collection. 2021-04-09T00:22:21.478+0000 E QUERY [thread1] Error: Error: assert.soon failed, msg:Waited too long for lock balancer to unlock : sh.waitForBalancerOff@src/mongo/shell/utils_sh.js:268:15 sh.waitForBalancer@src/mongo/shell/utils_sh.js:294:9 sh.stopBalancer@src/mongo/shell/utils_sh.js:161:5 @(shell):1:1mongos> sh.isBalancerRunning() true
After that I checked the changelog and actually the balancer stopped. I even waited a few days. Right now changelog is only increasing with multi-split but the last moveChunk.to has a few days.
I need to let mongo know that the balancer is stopped so that the command isBalancerRunning returns false and I can move on.
I think I need to edit the 'state' key on 'balancer' document on 'locks' collection on config database. (But I'm not really sure). Right now has value 2. (which is the only value documented on the documentation [3])
This is what locks look like:
configsvr> db.locks.find() { "_id" : "configUpgrade", "state" : 0, "who" : "a258014d8fed:27017:1473202024:1194521542:mongosMain:1804289383", "ts" : ObjectId("57cf4768b7dc19454fe95602"), "process" : "a258014d8fed:27017:1473202024:1194521542", "when" : ISODate("2016-09-06T22:47:04.820Z"), "why" : "initializing config database to new format v6" } { "_id" : "DB_files", "state" : 0, "who" : "a258014d8fed:27017:1473202024:1194521542:conn1:596516649", "ts" : ObjectId("57cf47adb7dc19454fe9560d"), "process" : "a258014d8fed:27017:1473202024:1194521542", "when" : ISODate("2016-09-06T22:48:13.861Z"), "why" : "enableSharding" } { "_id" : "DB_files.fs.chunks", "state" : 0, "who" : "ubuntumal:37017:1618234782:1939117970:conn6:691977887", "ts" : ObjectId("60745116a0119faa257ea5f7"), "process" : "ubuntumal:37017:1618234782:1939117970", "when" : ISODate("2021-04-12T13:54:30.894Z"), "why" : "splitting chunk [{ files_id: ObjectId('60744cb301a808a4c578bddb'), n: 37 }, { files_id: MaxKey, n: MaxKey }) in DB_files.fs.chunks" } { "_id" : "DB_files-movePrimary", "state" : 0, "who" : "9e6d128f5e63:27017:1552409205:279181987:conn10953768:312715989", "ts" : ObjectId("5cdf23d3e226ea53560379fb"), "process" : "9e6d128f5e63:27017:1552409205:279181987", "when" : ISODate("2019-05-17T21:12:51.602Z"), "why" : "Moving primary shard of DB_files" } { "_id" : "balancer", "state" : 2, "who" : "9e6d128f5e63:27017:1582238937:234309158:Balancer:2025600939", "ts" : ObjectId("60647abde2785e16702f6ef4"), "process" : "9e6d128f5e63:27017:1582238937:234309158", "when" : ISODate("2021-03-31T13:35:57.979Z"), "why" : "doing balance round" }
I know that on normal operation I'm not supposed to edit the config db. I have a backup just in case. I can temporarily stop the shards if necessary.
[1] https://docs.mongodb.com/manual/release-notes/3.4-upgrade-sharded-cluster/
[2] https://docs.mongodb.com/v3.4/tutorial/upgrade-config-servers-to-replica-set/
[3] https://docs.mongodb.com/v3.2/reference/config-database/