Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Works as Designed
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.4.4
Component/s: Sharding
Labels:
None
Environment:
3.4.4 sharded cluster with 18 shards, each consisting of 1 replica, 1 primary, and 1 hidden replica. 3 config servers (CSRS) and 5 mongoS

Operating System:
ALL
Steps To Reproduce:
Hide

Stop the balancer

Wait for the balancer to finish it's migration and stop

Check locks collection for the balancer lock

Let me know if you need more information to help reproduce. I'm not sure what you need now but I'm sure you'll need something.
Show
Stop the balancer Wait for the balancer to finish it's migration and stop Check locks collection for the balancer lock Let me know if you need more information to help reproduce. I'm not sure what you need now but I'm sure you'll need something.
Confidence Status:
None
Work Order:
3

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

After upgrading our main mongo cluster from 3.2.12 to 3.4.4, we've noticed a weird behavior where the balancer never relinquishes it's lock. I can run sh.isBalancerRunning() and sh.getBalancerState(), both of which return false, but the balancer lock still shows a state of "2".
Found using:

db.getSiblingDB("config").locks.findOne({_id: "balancer"}).state

I've checked the changelog collection and haven't found any evidence there that the balancer is still actually running.

We also have had a problem for a while with moving chunks in this cluster due to mismatching index definitions on the various shards, which we are blocked from repairing due to another bug with dropping indexes which I'll log elsewhere and link to this.

We turn off the balancer every night to do some system maintenance, and for now we've been having to manually free the balancer lock otherwise this maintenance gets stuck waiting for the balancer to finish it's migration.

On a possibly related note, I've had to fix this balancer lock a few times in the past few days, so either some process on our end keeps re-enabling the balancer, or the lock keeps getting re-established on its own.

Assignee:: Kaloian Manassiev
Reporter:: Scott Glajch
Participants:: Kaloian Manassiev, Scott Glajch
Votes:: 0 Vote for this issue
Watchers:: 6 Start watching this issue

Created:: Jun 13 2017 03:32:48 PM UTC
Updated:: Oct 27 2023 01:54:24 PM UTC
Resolved:: Jun 13 2017 03:49:28 PM UTC

Details

Description

Attachments

Activity

People

Dates