[SERVER-27368] Balancer not working on 3.4 Created: 10/Dec/16 Updated: 14/Dec/16 Resolved: 14/Dec/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Brandon Tomblinson | Assignee: | Unassigned |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Steps To Reproduce: | Had 4 3.2.10 instances on a Windows Server 2012 development server(1 mongos, 2 shard instances, 1 config) all mongods setup as replicas even though they were standalone instances in the development environment |
||||||||||||
| Participants: | |||||||||||||
| Description |
|
Hello, I recently tried to deploy MongoDB 3.4 to my sharded cluster running 3.2.10. After a lot of trouble including having to re deploy the setup I was able to get all components on, but then discovered that the balancer wasn't working and I was getting multiple errors of the config server timing out and read/write concerns throwing warnings and even the other components saying it was down when it wasn't. Either way I still couldn't get it to balance even adding another shard would not start moving chunks over so I reverted back to 3.2.10 and it works normal. I have attached log files for each component during the time when 3.4 was deployed(look around 12:30 time stamps onward on the log files to see the relevant info when the 3.4 instances were started up) and can provide other things such as config files diagnostic data etc if needed to help but I believe it is a bug because when reverted back to 3.2.10 it was balancing and working as expected. |
| Comments |
| Comment by Brandon Tomblinson [ 13/Dec/16 ] |
|
Spencer, -Brandon |
| Comment by Spencer Brody (Inactive) [ 12/Dec/16 ] |
|
Hi btomblinson, For a little more background, the issue is basically that in 3.2 we were incorrectly marking writes as durable in certain cases on ephemeral storage engines or when journaling is disabled. In 3.4 we fixed that behavior, but that means that now the committed snapshot used for readConcern:majority reads never progresses if journaling is off, unless you also set the writeConcernMajorityJournalDefault value in the replica set config to false. In the case of config servers, however, it should be forbidden to ever turn off journaling. It's only due to a bug in our option parsing that allowed this to happen. If you turn journaling on for your config server, everything should start working. Sorry for the confusion. |
| Comment by Brandon Tomblinson [ 11/Dec/16 ] |
|
Ramon, |
| Comment by Ramon Fernandez Marina [ 11/Dec/16 ] |
|
btomblinson, I believe the behavior you're observing is related to having a one-node config server. We're investigating further and we'll post updates on this ticket. Regards, |