[SERVER-21190] Bring back --nojournal for config servers Created: 28/Oct/15  Updated: 30/Nov/16  Resolved: 30/Nov/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.0-rc1
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Gustavo Niemeyer Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-22500 Do not require journaling for config ... Closed
Related
is related to SERVER-21198 Using --nojournal on 3.2 WiredTiger s... Closed
Sprint: Sharding D (12/11/15)
Participants:

 Description   

Hello,

Would it be possible to bring back the --nojournal option to the config servers, or is there a critical reason why this is now forbidden other than being a good practice to not use it?

Although that may be a poor practice in production, it is a very useful option for test suites that require bringing up and down several servers, and care about the speed of operations and not at all about the safeguards brought by journaling.

Thanks for considering it.



 Comments   
Comment by Eric Milkie [ 30/Nov/16 ]

In version 3.4, it is no longer possible to run config servers using the MMAPv1 storage engine, so it no longer makes sense to bring back --nojournal. (Note that for WiredTiger, running without journaling enabled does not make things faster.)

In version 3.2 and prior, it is possible to use the --nopreallocj option to avoid preallocating journal files for mirrored config servers at startup, which may improve the speed of tests using the MMAPv1 storage engine. We use this option in our own test suites.

Comment by Gustavo Niemeyer [ 06/Jan/16 ]

I'm not doing anything funky here that I'm aware of, and I'm using SSDs. If you have magic tricks to make twelve MongoDB servers not take an awful lot of time to be brought up as a few replica sets and become healthy, I'm all ears. Gabriel Russel has the setup locally at the office to show you, if you'd like to have a look.

Comment by Andy Schwerin [ 06/Jan/16 ]

OK. I'm still surprised. We bring up sharded clusters and shut them down dozens of times per hour on single machines as part of automated testing. Anyhow, send a pull request and I'll merge it.

Comment by Gustavo Niemeyer [ 06/Jan/16 ]

It's not only the test times that are affected, but the bringing up of the test cluster at all. Every time the cluster has to be brought up these three MongoDB config servers alone will trash the disk with 1.3GB when being started. It was already slow before, and there are already delayed checks to wait until it all comes online and becomes healthy, but this is now driving the server to a standstill. I could add further delayed checks everywhere, and even space up the starting of each of the servers, but this doesn't feel very reasonable or productive. One of the great features of MongoDB to me has always been that it was a snappy database to deal with.. having to sit down for a long time to bring up a handful of servers every time I want to switch over to a different version of the database would be unfortunate.

Comment by Andy Schwerin [ 06/Jan/16 ]

We don't want to test config servers with this combination of options, so we disabled it. If you want to, send a pull request that allows this combination of options only when the enableTestCommands server parameter is set to true. I'll merge that, since it should prevent casual users from combining those options.

I'm surprised that your test time is affected by config server performance. Are you testing a lot of collection creation/drop behavior?

Comment by Gustavo Niemeyer [ 06/Jan/16 ]

I am using one config server, per replica set. Three replica sets, three config servers. It's not about testing mmap config servers as part of the Go driver.. It's about testing the Go driver at all, as explained above. WT doesn't even work with the suite right now, due to SERVER-21198. Even with that aside, the Go test suite has caught actual server bugs multiple times, SERVER-21403 being the most recent). So I don't understand the dismissal above.

Then, besides all of that, it's unclear why there's so much resistance to allowing an option to be used, one which has always been allowed until now, rather than artificially restricting it. Right now I'm manually hacking the server code so I can run tests.. Is there a good reason to make life harder over here? I don't think I've bothered you with requests these years.. surprises me the resistance for this trivial one liner.

I honestly don't understand

Comment by Andy Schwerin [ 06/Jan/16 ]

If you're only doing testing, you could just run a single config server instead of three. Additionally, it doesn't seem important to test mmap config servers as part of the go driver; you could just always test using WT config servers.

Comment by Gustavo Niemeyer [ 13/Nov/15 ]

Pretty please? This is currently the only way I can run the full test suite in a reasonable amount of time. I expect SERVER-21198 to fix the problem with WiredTiger, but even then fixing the bug here would still be the only way to get the suite to pass with MMAPv1, which is definitely helpful in terms of ensuring things are working properly (SERVER-21403 came out of that, for example).

Comment by Gustavo Niemeyer [ 04/Nov/15 ]

I have changed the server code locally to allow me to provide --nojournal to config servers in 3.2 and I can confirm that using mmapv1 the amount of disk use and activity becomes the same in the config servers as it used to be in 3.0, which enables me to test 3.2 at all.

There's still a problem with WiredTiger worth being fixed before 3.2 is out as being tracked in SERVER-21198, but that seems like an independent issue from this flag. I would appreciate if we could drop that artificial constraint so I can abandon my local hack.

Comment by Gustavo Niemeyer [ 03/Nov/15 ]

As another data point, this is the directory sizes for several databases out of my test suite:

Was running it with mmapv1 storage engine for testing purposes.

% du -sh *
419M cfg1
419M cfg2
419M cfg3
18M db1
18M db2
18M rs1a
18M rs1b
18M rs1c
18M rs2a
18M rs2b
18M rs2c
18M rs3a
18M rs3b
18M rs3c
18M rs4a

Would appreciate if we could take back the ability to not have these 400MB directories in such a scenario for 3.2.

Comment by Gustavo Niemeyer [ 03/Nov/15 ]

Current investigations in SERVER-21198 seem indicative that --nojournal is in fact relevant in WiredTiger as well.

Can we please have the flag back for config servers too?

Comment by Daniel Pasette (Inactive) [ 29/Oct/15 ]

The journal is a reasonable requirement for config servers, and as Andy said, with WT it shouldn't have a huge impact on your test harness performance. I think SERVER-21198 is the real issue here.

Comment by Gustavo Niemeyer [ 29/Oct/15 ]

I'm using the default storage engine in the setup, which in my understanding should mean WiredTiger on 3.2.

Perhaps this is a red-herring in the middle of a more important issue. I'm just observing significant slow downs when running tests on 3.2, which Gabriel could reproduce yesterday. Will file a separate ticket about that.

Comment by Andy Schwerin [ 28/Oct/15 ]

Are these test suites running the WiredTiger storage engine? Could they be?
The startup/teardown time with journaling enabled is significantly better
with WiredTiger than with MMAPv1. It may be a non-issue if you could be
using WT for your config servers.

On Wed, Oct 28, 2015 at 7:02 PM Gustavo Niemeyer (JIRA) <jira@mongodb.org>

Generated at Thu Feb 08 03:56:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.