[SERVER-24035] balancer does not respect active window Created: 03/May/16  Updated: 04/May/16  Resolved: 04/May/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.0.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Dai Shi Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-20557 Active window setting is not being pr... Closed
Operating System: ALL
Participants:

 Description   

We just turned the balancer back on for a cluster where it has been off for many months. Typically, we run the balancer 24/7, but in this case we can only run it for a few hours a day. I configured the balancer active window to be between 23:00 and 6:00, per the example given here: https://docs.mongodb.org/v3.0/tutorial/manage-sharded-cluster-balancer/#schedule-the-balancing-window

However, I noticed that the balancer was running even outside of these hours, and actually caused severe issues to our site this morning. I've reproduced the steps here:

mongos> db.settings.find({ _id : "balancer" })
{ "_id" : "balancer", "_secondaryThrottle" : false, "_waitForDelete" : true, "activeWindow" : { "start" : "23:00", "stop" : "6:00" }, "stopped" : true }
mongos> db.locks.find({ _id : "balancer" }, { "state" : 1 })
{ "_id" : "balancer", "state" : 0 }
mongos> new Date().toLocaleString()
Tue May 03 2016 21:05:18 GMT+0000 (UTC)
mongos> db.settings.update({ _id : "balancer" }, { $set : { stopped : false } }, { upsert: true })
mongos> db.locks.find({ _id : "balancer" }, { "state" : 1 })
{ "_id" : "balancer", "state" : 2 }
mongos> new Date().toLocaleString()
Tue May 03 2016 21:06:25 GMT+0000 (UTC)
mongos> db.settings.update({ _id : "balancer" }, { $set : { stopped : true } }, { upsert: true })
mongos> new Date().toLocaleString()
Tue May 03 2016 21:06:56 GMT+0000 (UTC)
mongos> db.locks.find({ _id : "balancer" }, { "state" : 1 })
{ "_id" : "balancer", "state" : 0 }

As you can see, the time when I ran those commands was just after 21:00, which should not be inside the active window. However, after turning the balancer on, it immediately started migrating chunks. Is there something I'm missing?



 Comments   
Comment by Dai Shi [ 04/May/16 ]

OK, thanks for looking into this. We will patch upgrade before turning the balancer back on.

Comment by Ramon Fernandez Marina [ 04/May/16 ]

dai@foursquare.com, this issue was reported earlier in SERVER-20557, and was fixed in 3.0.7.

Please consider upgrading to the latest 3.0 release (3.0.11 at the time of this writing) at your earliest convenience. Note also that 3.2 is not affected by this issue (3.2.6 is the latest release in that branch).

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 04/May/16 ]

dai@foursquare.com, I'm able to reproduce what I think is the same behavior you describe in a test cluster. I've also seem to have found a workaround though – after you enable the balancer, disable it again and re-enable it:

mongos> db.settings.update({ _id : "balancer" }, { $set : { stopped : true } }, { upsert: true })
mongos> db.settings.update({ _id : "balancer" }, { $set : { stopped : false } }, { upsert: true })

We're investigating and we'll post updates on this ticket when we have them.

Thanks,
Ramón.

Generated at Thu Feb 08 04:05:12 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.