[SERVER-27368] Balancer not working on 3.4 Created: 10/Dec/16  Updated: 14/Dec/16  Resolved: 14/Dec/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Brandon Tomblinson Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodConfig.log     Text File mongodRouter.log     Text File mongodShard1.log     Text File mongodShard2.log    
Issue Links:
Related
is related to SERVER-23956 Inconsistent behavior between 3.2.6-r... Closed
is related to SERVER-24551 Version 3.2 incorrect 'nojournal is n... Closed
Operating System: ALL
Steps To Reproduce:

Had 4 3.2.10 instances on a Windows Server 2012 development server(1 mongos, 2 shard instances, 1 config) all mongods setup as replicas even though they were standalone instances in the development environment
Tried to deploy 3.4 using guidelines from website and after finally getting it back up found log errors for unable to connect to config server causing balancer to not work and database became unstable

Participants:

 Description   

Hello, I recently tried to deploy MongoDB 3.4 to my sharded cluster running 3.2.10. After a lot of trouble including having to re deploy the setup I was able to get all components on, but then discovered that the balancer wasn't working and I was getting multiple errors of the config server timing out and read/write concerns throwing warnings and even the other components saying it was down when it wasn't. Either way I still couldn't get it to balance even adding another shard would not start moving chunks over so I reverted back to 3.2.10 and it works normal. I have attached log files for each component during the time when 3.4 was deployed(look around 12:30 time stamps onward on the log files to see the relevant info when the 3.4 instances were started up) and can provide other things such as config files diagnostic data etc if needed to help but I believe it is a bug because when reverted back to 3.2.10 it was balancing and working as expected.



 Comments   
Comment by Brandon Tomblinson [ 13/Dec/16 ]

Spencer,
Thank you for your help! I enabled journaling for the config server and re-deployed 3.4 and it works as expected! You can mark this ticket as closed. Thanks again.

-Brandon

Comment by Spencer Brody (Inactive) [ 12/Dec/16 ]

Hi btomblinson,
This is due to a combination of SERVER-24551 and SERVER-23956. The short version is that you need to enable journaling on the config server.

For a little more background, the issue is basically that in 3.2 we were incorrectly marking writes as durable in certain cases on ephemeral storage engines or when journaling is disabled. In 3.4 we fixed that behavior, but that means that now the committed snapshot used for readConcern:majority reads never progresses if journaling is off, unless you also set the writeConcernMajorityJournalDefault value in the replica set config to false.

In the case of config servers, however, it should be forbidden to ever turn off journaling. It's only due to a bug in our option parsing that allowed this to happen. If you turn journaling on for your config server, everything should start working.

Sorry for the confusion.
-Spencer

Comment by Brandon Tomblinson [ 11/Dec/16 ]

Ramon,
I thought that too, so I tried creating and adding another config server and it did the exact same thing on both.

Comment by Ramon Fernandez Marina [ 11/Dec/16 ]

btomblinson, I believe the behavior you're observing is related to having a one-node config server. We're investigating further and we'll post updates on this ticket.

Regards,
Ramón.

Generated at Thu Feb 08 04:14:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.