-
Type: Bug
-
Resolution: Works as Designed
-
Priority: Major - P3
-
None
-
Affects Version/s: 3.4.4
-
Component/s: Sharding
-
None
-
Environment:3.4.4 sharded cluster with 18 shards, each consisting of 1 replica, 1 primary, and 1 hidden replica. 3 config servers (CSRS) and 5 mongoS
-
ALL
-
After upgrading our main mongo cluster from 3.2.12 to 3.4.4, we've noticed a weird behavior where the balancer never relinquishes it's lock. I can run sh.isBalancerRunning() and sh.getBalancerState(), both of which return false, but the balancer lock still shows a state of "2".
Found using:
db.getSiblingDB("config").locks.findOne({_id: "balancer"}).state
I've checked the changelog collection and haven't found any evidence there that the balancer is still actually running.
We also have had a problem for a while with moving chunks in this cluster due to mismatching index definitions on the various shards, which we are blocked from repairing due to another bug with dropping indexes which I'll log elsewhere and link to this.
We turn off the balancer every night to do some system maintenance, and for now we've been having to manually free the balancer lock otherwise this maintenance gets stuck waiting for the balancer to finish it's migration.
On a possibly related note, I've had to fix this balancer lock a few times in the past few days, so either some process on our end keeps re-enabling the balancer, or the lock keeps getting re-established on its own.