[SERVER-5994] Server created 30GB files after restart in one minute interval Created: 03/Jun/12 Updated: 15/Aug/12 Resolved: 04/Jun/12 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Patrick Reyes | Assignee: | Tad Marshall |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | Crash | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Win 2008 |
||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Operating System: | Windows | ||||||||
| Participants: | |||||||||
| Description |
|
After restarting our 75gb server mongo started creating new data files until the disk was full and it could not write more data. The size of the db has more than doubled in a couple of seconds. From there on we had to delete the last corrupted data file. But each time we restart mongo it creates new files we have reached 200 gb in a couple of hours. The |
| Comments |
| Comment by Tad Marshall [ 04/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Glad to hear it! Thanks for letting us know! | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Reyes [ 04/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Just wanted to let you know that migrating to 2.0.5 solved the issue | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
I haven't traced through the code to see how the failure plays out, so I don't know if the failed updates were lost. It seems like the existing document should not have been marked as deleted until the new one had been written, but if trying to find it by object ID isn't working then it may be lost. The log fragment you posted seemed to have complete documents in it ... the update was trying to replace the entire document, and it seemed as if the log had all the information needed to insert the document anew. It might be possible to use the logs to regenerate the missing documents if they are really missing. It is also possible that the index is being used to find the document and it is the index that is wrong. You could try reindexing the collection and see if the old document is findable after reindexing. It is definitely a good idea to do a backup before doing a repair. Since the state of things is a bit unknown, it makes sense to have a fallback plan in place: an exact copy of what you have right now so that no data will be lost if the repair doesn't do what you need. If you have the hardware resources to do it, copying the data directory to another disk and doing the repair there would be a fine plan. You of course don't want updates happening to the original files while you try this, so you should stop the server before making the copy and not start it up again until you know if you have repaired the database. After shutting down mongod.exe, you can copy the entire directory tree to a new location and then start mongod.exe up again with
to run repair on the copy and create a log of the repair operation. When it is finished, you should be able to run db.stats() on it and compare the document count and data size to verify that all of your data is still there. Let me know if this is unclear or if you have more questions. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Reyes [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the quick answer we will try that. As I mentionned the row that was handled just before the assertion message is gone from the collection (when we query the collection with the key it's gone). Is there a chance that the repair will get them back or are they gone ? Any chance we can find them in a log somewhere ? Is there a risk that the db is still corrupted after the repair ? If the repair decides to delete an element of a collection can we identify it ? IS there a chance that the repair will delete collection elements or will it only repair the structure ? If some elements are deleted is there a way we can identify them after the repair ? Can we do the repair on a different disk (i.e. we move all the data files to a different disk where ther is ample space and then lounch mongo with --datafile to the new location, repair it and then move the files back to the original disk) ? | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi Patrick, Thanks for all the information. I can see how this would look totally broken to you. Your db.stats() say that you have around 65 GB of data with 71 GB of disk space allocated to it, but your data files are taking up 289 GB. Moreover, you have 85 extents (the divisions within a data file that hold data) but 135 files. At least 50 of those files are empty ... all zeros. This is bug https://jira.mongodb.org/browse/SERVER-5754 , which unfortunately infected version 2.0.5-rc0 which you are using. The fix was a one line fix (actually a one character fix) made in commit https://github.com/mongodb/mongo/commit/dc2f8cd3df44ea33f9813aadb27804e084abea11 . This is fixed in the final version 2.0.5, as well as in version 2.0.6-rc0. Your best bet is to upgrade as soon as possible to version 2.0.5 and then do a repair when you have a chance. Repair makes your database unavailable while it is running and requires enough free disk space to hold a new copy of your data, so you would need at least 65 to 71 GB free and likely a bit more. When repair finishes, it installs the new files and deletes the old ones, so it will free up your disk space. With 65 GB of data to be copied, it will probably take several hours to complete. You can read about running repair at Let us know if this fixes it for you ... it should. Thanks! Tad | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Reyes [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Mongo version
Directory content of data dir
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Reyes [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Added Log file | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Patrick Reyes [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
It looks like each time the files are created there is an assertion failure> DIM.PaperCopy Assertion failure approxSize < Extent::maxSize() db\pdfile.cpp 437 Not 100% sure however. I have added the log from 00h05 today Morning when mongo has created 9 new files (DIM.101 to DIM.109) because I don't have access to the log dated 1 june when all started before tomorrow. Since midnight it has created a total of 80 GB FILES and my disks will be full again before this evening. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Tad Marshall [ 03/Jun/12 ] | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Can you please post: 1) The version of mongod.exe you are using; Thanks! |