[SERVER-21198] Using --nojournal on 3.2 WiredTiger slows down tests by an order of magnitude Created: 29/Oct/15 Updated: 22/Apr/16 Resolved: 04/Apr/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | WiredTiger |
| Affects Version/s: | 3.2.0-rc1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Gustavo Niemeyer | Assignee: | David Hows |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | ALL | ||||||||
| Participants: | |||||||||
| Description |
|
I've been trying to identify the root cause of a major slow down in tests when running in 3.2, but have found nothing obvious so far, except for some pretty high disk activity during tests. To reproduce, start the test suite databases (there are several of them, in varied setups to exercise several different aspects of the driver):
and then run a selection of tests; GridFS is a good example as they are all passing with 3.2 and most of them are very simple:
In Gabriel Russell's machine, this was the result as of yesterday evening:
Same tests in the same system running on 3.0:
|
| Comments |
| Comment by Michael Hudson-Doyle [ 22/Apr/16 ] | ||||||||||||||||||||||||||||||||||||
|
I think I am seeing a related problem in a different test suite (juju's). The bug for juju is here: https://bugs.launchpad.net/juju-core/+bug/1573294 but to summarize: It seems to be index creation that's the bottleneck. Removing the --nojournal flag helps quite a bit, but not enough: 3.2/nojournal: 20s And 3.2 with mmapv1 is ~the same as 2.4, this is mmapv1 vs WT, nothing else afaict. I did some gdb and strace hackery that lends some weight to the suggestion that it's time spend in fdatasync that's slowing us down, although it's not reflected in an increase in system time reported by time(1) or top. mongod won't start for me with eatmydata for some reason, so I can't try that, it just loops:
| ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 04/Apr/16 ] | ||||||||||||||||||||||||||||||||||||
|
I don't have a follow up yet. My last attempt still demonstrated the issues were in place, but I haven't yet changed the suite to use libeatmydata. | ||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 30/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
gustavo.niemeyer, do you have any follow up here? If not, I'l mark this issue as resolved. | ||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 10/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
Hi Gustavo, There are a few cases in the WiredTiger test suite when having these checkpoints and filesync's running has also caused delays that we felt could be avoided (such as in our Pull Request Tester. In those cases we placed libeatmydata in front of WT, which had the desired effect. I have tested and this has the same positive impact when testing with MongoDB and works with both WT and MMAP. Below you can see that all the tests are running at good speeds with eatmydata enabled. Can you test and see if libeatmydata has the same impact for you?
| ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 07/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
Thanks for the explanation, David. This explains the problem, but what are we going to do about it? Every MongoDB release worked in a very satisfactory way while dropping databases in between tests, and that's an excellent feature not only for driver development but for the testing of any application at all, as it allows giving each test a blank slate on which to build its setup with almost no work. So my own real problem is not why this is happening, but that it is happening at all. The GridFS example above is just an easy demonstration of that, but this affects the whole driver and every other major application I'm related to and that is based on MongoDB. I could enable the journal, but then the disk activity brings the testing system to a halt for a significant while. As an example, just last week I've put this test matrix in place, on a very popular CI service: https://travis-ci.org/go-mgo/mgo This is running tests with MongoDB 2.2, 2.4, 2.6, 3.0 and 3.2 (and Go 1.4, 1.5 and 1.6, but that's irrelevant for the point here). To make tests pass in MongoDB 3.2, though, I had to patch the server so it would accept --nojournal for the config servers, and use MMAPv1. I couldn't make WT happy. | ||||||||||||||||||||||||||||||||||||
| Comment by David Hows [ 07/Mar/16 ] | ||||||||||||||||||||||||||||||||||||
|
In addition to Bruce's explanation above, WT must also perform a metadata checkpoint whenever it creates a table (any collection or any index) when nojournal is configured. Remember is an explicit drop of all databases after each test in the suite, meaning we are doing 5 metadata checkpoints each time a new test is started. With this in mind, I went and modified Gustavo's GridFS suit to remove all data in all collections between runs rather than dropping all databases (along with another few quick mods to solve newly failing tests). This showed that performance for the two storage engines are basically at parity when running with nojournal and not needing to constantly create new tables.The exception to this is the TestGridFSCloseConflict test, which explicitly creates (and later drops in my mod) a new index. I hope this gives enough of an explanation as to why we see these differences. gustavo.niemeyer are you okay if I mark this as resolved? Test 3.3.2 WT + no drops
Test 3.3.2 MMAP + no drops
| ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 04/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
I have changed the server code to allow me to provide --nojournal to config servers as well in 3.2 (related to That confirms that the issue raised in this ticket is completely related to WiredTiger, somehow. The --nojournal flag is a red-herring in that one test case, though. Having the flag just enables the mmapv1 backend to run without a massive amount of disk use (>400MB per config server), which breaks the test suite. | ||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Russell (Inactive) [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
The "order of magnitude" difference observed by Gustavo is the first column of the table that posted earlier. It's the difference between running mmapv1 with --nojournal and running wiredTiger with --nojournal. For whatever reasons, --nojournal appears to make, for this suite of tests, mmapv1 perform faster while making wiredTiger perform slower. Initially it was assumed that the difference was between 3.0 and 3.2 but eventually someone, Dan I think, guessed that we were actually looking at a WT vs mmapv1 issue. | ||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
I think the analysis above explains adequately why the --nojournal option is slower with WT, at least for the particular data that Gabriel posted to the ticket. However what I don't know is does this fully explain the subject of this ticket, "order of magnitude slowdown running tests in 3.2". Are all tests where this order of magnitude slowdown occurred run with --nojournal, and does the order of magnitude slowdown described in the opening comment disappear if the tests are not run with --nojournal? (Note: I have not reproduced anything nor run any of the tests; I just analyzed some data from a run that was posted to this ticket.) | ||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Russell (Inactive) [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
Bruce, these are the exact flags in use for the slow and fast results Slow:
Fast:
I did run the tests on a stand-alone, and the results seamed fairly similar, but they rad so much faster that I didn't feel that the numbers were very meaningful. I could try to increase the load and perform the tests with and without --nojournal, against a stand-alone if you think that's valuable. Gabriel | ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
Have you managed to reproduce the 4-seconds timing per the above report? That would show it is on pair. | ||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
I suspect it's related to this, which uses checkpoints in lieu of journal if journal is not enabled:
There are a few waitUntilDurable calls in the replication code related to the oplog, and in the metrics above we see replSetUpdatePosition commands being executed at about the same rate as the log syncs, so I suspect the cause of the slowdown with --nojournal may be related to replication code using checkpoints in lieu of journal when it is disabled. Question: are all cases of slowdown in 3.2 tests associated with --nojournal repl sets, or is there more to investigate separately? | ||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 03/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
I meant whatever it is you ran to generate the table in your comment above. Can you confirm the exact mongod flags you used for each of the diagnostic.data directories you uploaded? | ||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Russell (Inactive) [ 02/Nov/15 ] | ||||||||||||||||||||||||||||||||||||
|
Bruce, It does appear to me that the --nojournal flag with wiredTiger does slow down this one mgo gridfs test that I've been looking at. Which are the fast and slow cases you refer to? Are you referring to running the whole mgo test suite? Because I don't think it currently runs to completion. I'll upload the diagnostic.data folders for the run of this mgo gridfs test against WT with and without --nojournal. Gabriel | ||||||||||||||||||||||||||||||||||||
| Comment by Bruce Lucas (Inactive) [ 30/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
gabriel.russell, can you confirm what I think your data is saying: wt tests are slower with --nojournal flag? That is surprising. Can you do the experiments necessary to determine which of the four flags is making the difference? Also, can you do the fast and slow cases on wt and then tar and upload the diagnostic.data directory, together with the exact times you ran the tests so we can correlate it with the data? Might be able to spot something that accounts for the difference. | ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 30/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
Thanks to everybody for the help around this. Do we see any light at the end of the tunnel yet? I'll be changing the suite to use mmapv1 for the time being as I cannot be blocked on this any more, but that means not testing the server with the default storage engine, which is of course not ideal. It would be great to be able to exercise WT in test suites with proper performance, instead. | ||||||||||||||||||||||||||||||||||||
| Comment by Gabriel Russell (Inactive) [ 29/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
I did timings of the gridfs tests on 3.2. I timed the tests with and without the "--noprealloc --nojournal --smallfiles --nssize=1" flags across both storage engines.
Which looks like the storage engines act fairly similarly without those flags, but move apart from each other with them. | ||||||||||||||||||||||||||||||||||||
| Comment by Daniel Pasette (Inactive) [ 29/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
gabriel.russell walked me through how the test framework is put together. Let's discuss when you're in today. | ||||||||||||||||||||||||||||||||||||
| Comment by Gustavo Niemeyer [ 29/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
These are all the servers that are run by make startdb: https://github.com/go-mgo/mgo/blob/v2-unstable/testdb/supervisord.conf This is how they are started: https://github.com/go-mgo/mgo/blob/v2-unstable/testdb/setup.sh After these servers are started, they are initialized with this Javascript: https://github.com/go-mgo/mgo/blob/v2-unstable/testdb/init.js This is run between tests to clear the databases (the servers are not shut down by these tests): https://github.com/go-mgo/mgo/blob/v2-unstable/testdb/dropall.js This is the actual harness that does that: https://github.com/go-mgo/mgo/blob/v2-unstable/suite_test.go Finally, these are the actual tests run in the mentioned case: https://github.com/go-mgo/mgo/blob/v2-unstable/gridfs_test.go Since those are GridFS tests, there really isn't much special in terms of concerns in use. | ||||||||||||||||||||||||||||||||||||
| Comment by Scott Hernandez (Inactive) [ 29/Oct/15 ] | ||||||||||||||||||||||||||||||||||||
|
What does "make startdb" do to setup the test database environment? I assume it sets up a replica set, and maybe sets a default write concern? Or maybe a sharded cluster? Also, can you please link/describe the work that the test does, including any write concern, or read preferences or concerns used? |