[SERVER-29599] Balancer never relinquishes lock Created: 13/Jun/17 Updated: 27/Oct/23 Resolved: 13/Jun/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.4 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Scott Glajch | Assignee: | Kaloian Manassiev |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
3.4.4 sharded cluster with 18 shards, each consisting of 1 replica, 1 primary, and 1 hidden replica. 3 config servers (CSRS) and 5 mongoS |
||
| Operating System: | ALL |
| Steps To Reproduce: |
Let me know if you need more information to help reproduce. I'm not sure what you need now but I'm sure you'll need something. |
| Participants: |
| Description |
|
After upgrading our main mongo cluster from 3.2.12 to 3.4.4, we've noticed a weird behavior where the balancer never relinquishes it's lock. I can run sh.isBalancerRunning() and sh.getBalancerState(), both of which return false, but the balancer lock still shows a state of "2". db.getSiblingDB("config").locks.findOne({_id: "balancer"}).state I've checked the changelog collection and haven't found any evidence there that the balancer is still actually running. We also have had a problem for a while with moving chunks in this cluster due to mismatching index definitions on the various shards, which we are blocked from repairing due to another bug with dropping indexes which I'll log elsewhere and link to this. We turn off the balancer every night to do some system maintenance, and for now we've been having to manually free the balancer lock otherwise this maintenance gets stuck waiting for the balancer to finish it's migration. On a possibly related note, I've had to fix this balancer lock a few times in the past few days, so either some process on our end keeps re-enabling the balancer, or the lock keeps getting re-established on its own. |
| Comments |
| Comment by Scott Glajch [ 13/Jun/17 ] |
|
You're right, after looking into it, we had written direct code on our end to check for the lock state. I've updated our code and everything is fine now. Thank you for the quick response! |
| Comment by Kaloian Manassiev [ 13/Jun/17 ] |
|
I don't think the MongoDB Java driver has any means for controlling the balancer, only the shell helpers do. |
| Comment by Scott Glajch [ 13/Jun/17 ] |
|
Ok thanks! I think perhaps the java mongo driver we're using to determine if the balancer is still running might just need an upgrade. Hopefully that fixes our issue. I'll get back to you shortly on that. |
| Comment by Kaloian Manassiev [ 13/Jun/17 ] |
|
Hi glajchs, Starting in MongoDB version 3.4 we moved the balancer to run on the primary of the config server. As of this change, the balancer lock is intentionally not released, in order to prevent any accidentally left 3.2 or earlier mongos nodes from taking it. This is documented here. This indeed means that some of the older mongo shell utilities are not compatible with 3.4, so we recommend using the 3.4 shell. The implementation of sh.isBalancerRunning now uses a new command called balancerStatus. Hope this helps. Best regards, |