[SERVER-56026] state key on config.locks Created: 12/Apr/21  Updated: 06/Dec/22  Resolved: 12/Apr/21

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.21
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Javier Bassi Assignee: Backlog - Triage Team
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Server Triage
Participants:

 Description   

Context: I'm on the process of upgrading a sharded cluster from 3.2 to 3.4. Before that I need to convert the config servers from SCCC to a replica set (CSRS).[1] In order to do that I need first to stop de balancer.[2]

 

We I try to stop de balancer I get this:

configsvr>
Waiting for active host 9e6d128f5e63:27017 to recognize new settings... (ping : Wed Mar 31 2021 13:27:04 GMT+0000 (UTC))
Waited for active ping to change for host 9e6d128f5e63:27017, a migration may be in progress or the host may be down.
Waiting for the balancer lock...
assert.soon failed, msg:Waited too long for lock balancer to unlock
doassert@src/mongo/shell/assert.js:18:14
assert.soon@src/mongo/shell/assert.js:202:13
sh.waitForDLock@src/mongo/shell/utils_sh.js:198:1
sh.waitForBalancerOff@src/mongo/shell/utils_sh.js:264:9
sh.waitForBalancer@src/mongo/shell/utils_sh.js:294:9
sh.stopBalancer@src/mongo/shell/utils_sh.js:161:5
@(shell):1:1Balancer still may be active, you must manually verify this is not the case using the config.changelog collection.
2021-04-09T00:22:21.478+0000 E QUERY    [thread1] Error: Error: assert.soon failed, msg:Waited too long for lock balancer to unlock :
sh.waitForBalancerOff@src/mongo/shell/utils_sh.js:268:15
sh.waitForBalancer@src/mongo/shell/utils_sh.js:294:9
sh.stopBalancer@src/mongo/shell/utils_sh.js:161:5
@(shell):1:1mongos> sh.isBalancerRunning()
true

After that I checked the changelog and actually the balancer stopped. I even waited a few days. Right now changelog is only increasing with multi-split but the last moveChunk.to has a few days.

 

I need to let mongo know that the balancer is stopped so that the command isBalancerRunning returns false and I can move on.

I think I need to edit the 'state' key on 'balancer' document on 'locks' collection on config database. (But I'm not really sure). Right now has value 2. (which is the only value documented on the documentation [3])

 

This is what locks look like:

configsvr> db.locks.find()
{ "_id" : "configUpgrade", "state" : 0, "who" : "a258014d8fed:27017:1473202024:1194521542:mongosMain:1804289383", "ts" : ObjectId("57cf4768b7dc19454fe95602"), "process" : "a258014d8fed:27017:1473202024:1194521542", "when" : ISODate("2016-09-06T22:47:04.820Z"), "why" : "initializing config database to new format v6" }
{ "_id" : "DB_files", "state" : 0, "who" : "a258014d8fed:27017:1473202024:1194521542:conn1:596516649", "ts" : ObjectId("57cf47adb7dc19454fe9560d"), "process" : "a258014d8fed:27017:1473202024:1194521542", "when" : ISODate("2016-09-06T22:48:13.861Z"), "why" : "enableSharding" }
{ "_id" : "DB_files.fs.chunks", "state" : 0, "who" : "ubuntumal:37017:1618234782:1939117970:conn6:691977887", "ts" : ObjectId("60745116a0119faa257ea5f7"), "process" : "ubuntumal:37017:1618234782:1939117970", "when" : ISODate("2021-04-12T13:54:30.894Z"), "why" : "splitting chunk [{ files_id: ObjectId('60744cb301a808a4c578bddb'), n: 37 }, { files_id: MaxKey, n: MaxKey }) in DB_files.fs.chunks" }
{ "_id" : "DB_files-movePrimary", "state" : 0, "who" : "9e6d128f5e63:27017:1552409205:279181987:conn10953768:312715989", "ts" : ObjectId("5cdf23d3e226ea53560379fb"), "process" : "9e6d128f5e63:27017:1552409205:279181987", "when" : ISODate("2019-05-17T21:12:51.602Z"), "why" : "Moving primary shard of DB_files" }
{ "_id" : "balancer", "state" : 2, "who" : "9e6d128f5e63:27017:1582238937:234309158:Balancer:2025600939", "ts" : ObjectId("60647abde2785e16702f6ef4"), "process" : "9e6d128f5e63:27017:1582238937:234309158", "when" : ISODate("2021-03-31T13:35:57.979Z"), "why" : "doing balance round" }

I know that on normal operation I'm not supposed to edit the config db. I have a backup just in case. I can temporarily stop the shards if necessary.

[1] https://docs.mongodb.com/manual/release-notes/3.4-upgrade-sharded-cluster/

[2] https://docs.mongodb.com/v3.4/tutorial/upgrade-config-servers-to-replica-set/

[3] https://docs.mongodb.com/v3.2/reference/config-database/



 Comments   
Comment by Javier Bassi [ 12/Apr/21 ]

Thanks for the fast reply. balancerStatus was introduced on 3.4 I think. I'm trying to get there.

mongos> use admin
switched to db admin
mongos> db.adminCommand({balancerStatus:1}) 
{ "ok" : 0, "errmsg" : "no such cmd: balancerStatus", "code" : 59 }

Comment by Dmitry Agranat [ 12/Apr/21 ]

Hi jbassi@deloitte.com,

As MongoDB 3.2 has reached EOL in December 2015 and MongoDB 3.4 in November 2016, we'd like to encourage you to start by asking our community for help by posting on the MongoDB Developer Community Forums.

A short note about your issue. It looks like the output you have posted was from the sh.isBalancerRunning() command. The isBalancerRunning function is defined by the mongo shell, *not *the mongoS router. To verify the shell is reading the correct information, you can run:

db.adminCommand({balancerStatus:1}) 

The returned inBalancerRound boolean indicates whether the balancer is currently running. If inBalancerRound is false while sh.isBalancerRunning() is true, the mongo shell is using outdated code that is checking for a lock instead of querying .

Dima

Generated at Thu Feb 08 05:38:06 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.