[SERVER-16724] Startup time under WiredTiger is extremely slow and gets worse as database grows Created: 05/Jan/15 Updated: 15/Jan/15 Resolved: 13/Jan/15 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Performance, Storage |
| Affects Version/s: | 2.8.0-rc3 |
| Fix Version/s: | 2.8.0-rc5 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Nathan Arthur | Assignee: | Mark Benvenuto |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Backwards Compatibility: | Fully Compatible |
| Operating System: | ALL |
| Participants: |
| Description |
|
I did a test of my company's analytics app, using the WiredTiger storage engine, on Windows Server 2012, on a HyperV VM. During the test, I noticed that startup time for MongoDB was getting excessive - 2 hours in one case, 2.5 hours in another. It had been very quick the first few times I started it. I didn't take a lot of time to log the details of what was wrong, but the difference between the various times was only in the amount of data in the database - the more data, the slower the startup time. (Roughly 6MM records -> 2 hours, roughly 7.5MM -> 2.5 hours.) I don't have stats from those instances, but one of the databases in the instance is a single collection with one record per incoming message, stored as binary data, with a few fields of metadata and a few indexes. I assume it's that database that's causing the problem. (During the entire startup period, the system shows lots of read activity for a single huge file from the mongodb data directory.) The other databases have much lower overall volume/size. Sometime in the next day or so I'll need to restart it again, with about 20MM records in it. I'll collect more detailed info about it, then. I've attached log snippets from the two slow startups. |
| Comments |
| Comment by Nathan Arthur [ 13/Jan/15 ] | ||||
|
Fantastic - thank you! | ||||
| Comment by Mark Benvenuto [ 13/Jan/15 ] | ||||
|
I believe I was able to create a repro of your bug by doing the following:
I fix has been pushed to master, and I believe this will resolve the issue you saw. | ||||
| Comment by Githook User [ 13/Jan/15 ] | ||||
|
Author: {u'username': u'markbenvenuto', u'name': u'Mark Benvenuto', u'email': u'mark.benvenuto@mongodb.com'}Message: | ||||
| Comment by Nathan Arthur [ 13/Jan/15 ] | ||||
|
This seems like a plausible explanation, but there shouldn't have been any new collections created during these runs. The server had been started and restarted multiple times, and was even restarted after it already had lots of data, multiple times. The only way I can think that this would make sense is if every single shutdown was failing, and the problem propagated across starts - but that seems unlikely. | ||||
| Comment by Mark Benvenuto [ 12/Jan/15 ] | ||||
|
In this scenario, due to If the server is abruptly terminated during a session when a collection has just been created, then the server does not go through a clean shutdown. On start up, WiredTigerRecordStore::WiredTigerRecordStore counts every record in the collection, and gets the size of each record. | ||||
| Comment by Nathan Arthur [ 06/Jan/15 ] | ||||
|
I attached a new file with a bunch of shutdown logs. I didn't see any of those messages you listed, in any of them. (Even ones where the subsequent startup was fast.) Are you sure about that logging, on Windows, on RC3? Note that in the slow startup scenarios, nothing was logged about it doing a recovery - it just said it was starting up, then hung for hours, then started accepting connections. I also just tried hard-killing the mongod.exe process, and it started up immediately again. | ||||
| Comment by Mark Benvenuto [ 06/Jan/15 ] | ||||
|
In your attachment "additional_startup_time_logs.txt", do you trim any of the output from "PRIOR SHUTDOWN". I expect something like the following:
If you did not trim the shutdown log output, this means that Windows completed the shutdown before MongoDB had time to finish cleanly shutting down. This means Windows killed mongod.exe, and WiredTiger then had to go through recovery on restart. I will work in | ||||
| Comment by Nathan Arthur [ 06/Jan/15 ] | ||||
|
...and of course now it starts up essentially instantly. Even after multiple restarts. I have no idea why. There have been NO configuration changes to this host; the only difference is a lot more data imported (via application code), and a lot more time passed. I found a third log (from before) where startup time was around 4 hours. I don't remember the database size at the time, but it would have been correspondingly larger then the earlier examples. I'll attach that, as well as the log snippit from the prior shutdown (which was perfectly normal). | ||||
| Comment by Nathan Arthur [ 06/Jan/15 ] | ||||
|
I'm about to restart the server again. I collected a bunch of stats, which I'll upload as attachments (one with commands, one with the output). I'll report back later with the total restart time. | ||||
| Comment by Daniel Pasette (Inactive) [ 06/Jan/15 ] | ||||
|
Hi Nathan, thanks for the report. Can you run db.stats() and db.collection.stats() (substituting your collection name here) and add to this ticket? Also, how many total databases and collections do you have on this instance? | ||||
| Comment by Nathan Arthur [ 05/Jan/15 ] | ||||
|
I just noticed from the logs - apparently this was actually RC3, not RC0 as I tagged it. |