[SERVER-5994] Server created 30GB files after restart in one minute interval Created: 03/Jun/12  Updated: 15/Aug/12  Resolved: 04/Jun/12

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Patrick Reyes Assignee: Tad Marshall
Resolution: Duplicate Votes: 0
Labels: Crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Win 2008


Attachments: File logs.txt.2012-06-02T22-05-45     Text File supportinfo.txt    
Issue Links:
Duplicate
duplicates SERVER-5754 Initial sync assertion error with 2.0... Closed
Operating System: Windows
Participants:

 Description   

After restarting our 75gb server mongo started creating new data files until the disk was full and it could not write more data. The size of the db has more than doubled in a couple of seconds. From there on we had to delete the last corrupted data file. But each time we restart mongo it creates new files we have reached 200 gb in a couple of hours. The
Db is running journaling and we don't see any error in the logs but the db is partially corrupted. When we save data from time to time the data is deleted from the collection instead of being saved.



 Comments   
Comment by Tad Marshall [ 04/Jun/12 ]

Glad to hear it! Thanks for letting us know!

Comment by Patrick Reyes [ 04/Jun/12 ]

Just wanted to let you know that migrating to 2.0.5 solved the issue

Comment by Tad Marshall [ 03/Jun/12 ]

I haven't traced through the code to see how the failure plays out, so I don't know if the failed updates were lost. It seems like the existing document should not have been marked as deleted until the new one had been written, but if trying to find it by object ID isn't working then it may be lost.

The log fragment you posted seemed to have complete documents in it ... the update was trying to replace the entire document, and it seemed as if the log had all the information needed to insert the document anew. It might be possible to use the logs to regenerate the missing documents if they are really missing.

It is also possible that the index is being used to find the document and it is the index that is wrong. You could try reindexing the collection and see if the old document is findable after reindexing.

It is definitely a good idea to do a backup before doing a repair. Since the state of things is a bit unknown, it makes sense to have a fallback plan in place: an exact copy of what you have right now so that no data will be lost if the repair doesn't do what you need.

If you have the hardware resources to do it, copying the data directory to another disk and doing the repair there would be a fine plan. You of course don't want updates happening to the original files while you try this, so you should stop the server before making the copy and not start it up again until you know if you have repaired the database.

After shutting down mongod.exe, you can copy the entire directory tree to a new location and then start mongod.exe up again with

mongod --dbpath "locationOfCopy" --repair --logpath "filenameOfLogFile"

to run repair on the copy and create a log of the repair operation. When it is finished, you should be able to run db.stats() on it and compare the document count and data size to verify that all of your data is still there.

Let me know if this is unclear or if you have more questions.

Comment by Patrick Reyes [ 03/Jun/12 ]

Thanks for the quick answer we will try that.

As I mentionned the row that was handled just before the assertion message is gone from the collection (when we query the collection with the key it's gone). Is there a chance that the repair will get them back or are they gone ?

Any chance we can find them in a log somewhere ?

Is there a risk that the db is still corrupted after the repair ? If the repair decides to delete an element of a collection can we identify it ?

IS there a chance that the repair will delete collection elements or will it only repair the structure ? If some elements are deleted is there a way we can identify them after the repair ?

Can we do the repair on a different disk (i.e. we move all the data files to a different disk where ther is ample space and then lounch mongo with --datafile to the new location, repair it and then move the files back to the original disk) ?

Comment by Tad Marshall [ 03/Jun/12 ]

Hi Patrick,

Thanks for all the information.

I can see how this would look totally broken to you. Your db.stats() say that you have around 65 GB of data with 71 GB of disk space allocated to it, but your data files are taking up 289 GB. Moreover, you have 85 extents (the divisions within a data file that hold data) but 135 files. At least 50 of those files are empty ... all zeros.

This is bug https://jira.mongodb.org/browse/SERVER-5754 , which unfortunately infected version 2.0.5-rc0 which you are using. The fix was a one line fix (actually a one character fix) made in commit https://github.com/mongodb/mongo/commit/dc2f8cd3df44ea33f9813aadb27804e084abea11 .

This is fixed in the final version 2.0.5, as well as in version 2.0.6-rc0.

Your best bet is to upgrade as soon as possible to version 2.0.5 and then do a repair when you have a chance. Repair makes your database unavailable while it is running and requires enough free disk space to hold a new copy of your data, so you would need at least 65 to 71 GB free and likely a bit more. When repair finishes, it installs the new files and deletes the old ones, so it will free up your disk space. With 65 GB of data to be copied, it will probably take several hours to complete.

You can read about running repair at
http://www.mongodb.org/display/DOCS/Excessive+Disk+Space
and at
http://www.mongodb.org/display/DOCS/Durability+and+Repair .

Let us know if this fixes it for you ... it should.

Thanks!

Tad

Comment by Patrick Reyes [ 03/Jun/12 ]

MongoDB shell version: 2.0.5-rc0
connecting to: test
> show dbs
DIM     269.822265625GB
local   (empty)
> db.getSiblingDB("DIM").stats()
{
        "db" : "DIM",
        "collections" : 13,
        "objects" : 738961,
        "avgObjSize" : 88679.50869396355,
        "dataSize" : 65530698424,
        "storageSize" : 71503273536,
        "numExtents" : 85,
        "indexes" : 30,
        "indexSize" : 5605130384,
        "fileSize" : 289702674432,
        "nsSizeMB" : 16,
        "ok" : 1
}

Mongo version
-------------

mongod --help for help and startup options
Sun Jun 03 15:34:35 [initandlisten] MongoDB starting : pid=4448 port=27017 dbpath=/data/db 64-bit host=strdpc5
Sun Jun 03 15:34:35 [initandlisten] db version v2.0.5-rc0, pdfile version 4.5
Sun Jun 03 15:34:35 [initandlisten] git version: 1c22afa5d09342d2cd2fd149dbde3ca85eb8e48f
Sun Jun 03 15:34:35 [initandlisten] build info: windows sys.getwindowsversion(major=6, minor=1, build=7601, platform=2, service_pack='Service Pack 1') BOOST_LIB_VERSION=1_42

Directory content of data dir
------------------------------

Volume in drive D has no label.
 Volume Serial Number is E0AA-A80C
 
 Directory of D:\DiM\Data\MongoDB\Data
 
03/06/2012  15:29    <DIR>          .
03/06/2012  15:29    <DIR>          ..
03/06/2012  15:29                 0 data_dir.txt
23/03/2012  10:47        67 108 864 DIM.0
23/03/2012  10:52       134 217 728 DIM.1
23/03/2012  14:21     2 146 435 072 DIM.10
02/06/2012  23:39     2 146 435 072 DIM.100
03/06/2012  00:05     2 146 435 072 DIM.101
02/06/2012  23:39                 0 DIM.101.renamedbypat
03/06/2012  00:05     2 146 435 072 DIM.102
03/06/2012  00:05     2 146 435 072 DIM.103
03/06/2012  00:05     2 146 435 072 DIM.104
03/06/2012  00:05     2 146 435 072 DIM.105
03/06/2012  00:05     2 146 435 072 DIM.106
03/06/2012  00:05     2 146 435 072 DIM.107
03/06/2012  00:05     2 146 435 072 DIM.108
03/06/2012  00:05     2 146 435 072 DIM.109
23/03/2012  15:00     2 146 435 072 DIM.11
03/06/2012  08:36     2 146 435 072 DIM.110
03/06/2012  12:20     2 146 435 072 DIM.111
03/06/2012  12:20     2 146 435 072 DIM.112
03/06/2012  12:29     2 146 435 072 DIM.113
03/06/2012  12:30     2 146 435 072 DIM.114
03/06/2012  12:30     2 146 435 072 DIM.115
03/06/2012  12:30     2 146 435 072 DIM.116
03/06/2012  12:31     2 146 435 072 DIM.117
03/06/2012  13:09     2 146 435 072 DIM.118
03/06/2012  13:09     2 146 435 072 DIM.119
23/03/2012  15:48     2 146 435 072 DIM.12
03/06/2012  13:10     2 146 435 072 DIM.120
03/06/2012  13:18     2 146 435 072 DIM.121
03/06/2012  13:18     2 146 435 072 DIM.122
03/06/2012  13:18     2 146 435 072 DIM.123
03/06/2012  13:19     2 146 435 072 DIM.124
03/06/2012  13:19     2 146 435 072 DIM.125
03/06/2012  13:31     2 146 435 072 DIM.126
03/06/2012  13:31     2 146 435 072 DIM.127
03/06/2012  13:31     2 146 435 072 DIM.128
03/06/2012  13:31     2 146 435 072 DIM.129
23/03/2012  16:36     2 146 435 072 DIM.13
03/06/2012  13:31     2 146 435 072 DIM.130
03/06/2012  13:38     2 146 435 072 DIM.131
03/06/2012  13:38     2 146 435 072 DIM.132
03/06/2012  13:38     2 146 435 072 DIM.133
03/06/2012  13:38     2 146 435 072 DIM.134
23/03/2012  17:26     2 146 435 072 DIM.14
23/03/2012  18:14     2 146 435 072 DIM.15
23/03/2012  19:02     2 146 435 072 DIM.16
23/03/2012  19:18     2 146 435 072 DIM.17
23/03/2012  19:53     2 146 435 072 DIM.18
23/03/2012  20:42     2 146 435 072 DIM.19
23/03/2012  10:55       268 435 456 DIM.2
23/03/2012  21:32     2 146 435 072 DIM.20
23/03/2012  22:20     2 146 435 072 DIM.21
23/03/2012  23:09     2 146 435 072 DIM.22
23/03/2012  23:59     2 146 435 072 DIM.23
24/03/2012  00:51     2 146 435 072 DIM.24
24/03/2012  01:47     2 146 435 072 DIM.25
24/03/2012  02:41     2 146 435 072 DIM.26
24/03/2012  03:38     2 146 435 072 DIM.27
24/03/2012  04:34     2 146 435 072 DIM.28
24/03/2012  05:17     2 146 435 072 DIM.29
23/03/2012  11:01       536 870 912 DIM.3
24/03/2012  05:30     2 146 435 072 DIM.30
24/03/2012  06:27     2 146 435 072 DIM.31
24/03/2012  07:19     2 146 435 072 DIM.32
24/03/2012  08:11     2 146 435 072 DIM.33
26/03/2012  12:03     2 146 435 072 DIM.34
26/03/2012  14:16     2 146 435 072 DIM.35
30/03/2012  20:02     2 146 435 072 DIM.36
02/04/2012  20:06     2 146 435 072 DIM.37
21/04/2012  11:19     2 146 435 072 DIM.38
04/05/2012  17:38     2 146 435 072 DIM.39
23/03/2012  11:27     1 073 741 824 DIM.4
22/05/2012  10:41     2 146 435 072 DIM.40
01/06/2012  18:38     2 146 435 072 DIM.41
01/06/2012  18:38     2 146 435 072 DIM.42
01/06/2012  18:39     2 146 435 072 DIM.43
01/06/2012  18:39     2 146 435 072 DIM.44
01/06/2012  18:40     2 146 435 072 DIM.45
01/06/2012  18:40     2 146 435 072 DIM.46
01/06/2012  18:40     2 146 435 072 DIM.47
01/06/2012  18:40     2 146 435 072 DIM.48
01/06/2012  18:40     2 146 435 072 DIM.49
23/03/2012  11:51     2 146 435 072 DIM.5
01/06/2012  18:40     2 146 435 072 DIM.50
01/06/2012  18:40     2 146 435 072 DIM.51
01/06/2012  18:40     2 146 435 072 DIM.52
01/06/2012  18:40     2 146 435 072 DIM.53
01/06/2012  18:40     2 146 435 072 DIM.54
01/06/2012  18:41     2 146 435 072 DIM.55
01/06/2012  18:41     2 146 435 072 DIM.56
01/06/2012  18:41     2 146 435 072 DIM.57
01/06/2012  18:41     2 146 435 072 DIM.58
01/06/2012  18:41     2 146 435 072 DIM.59
23/03/2012  12:14     2 146 435 072 DIM.6
01/06/2012  18:41     2 146 435 072 DIM.60
01/06/2012  18:41     2 146 435 072 DIM.61
01/06/2012  18:41     2 146 435 072 DIM.62
01/06/2012  18:41     2 146 435 072 DIM.63
01/06/2012  18:41     2 146 435 072 DIM.64
01/06/2012  18:42     2 146 435 072 DIM.65
01/06/2012  18:42     2 146 435 072 DIM.66
01/06/2012  18:42     2 146 435 072 DIM.67
01/06/2012  18:42     2 146 435 072 DIM.68
01/06/2012  18:42     2 146 435 072 DIM.69
23/03/2012  12:48     2 146 435 072 DIM.7
01/06/2012  18:42     2 146 435 072 DIM.70
01/06/2012  18:42     2 146 435 072 DIM.71
01/06/2012  18:46     2 146 435 072 DIM.72
01/06/2012  18:46     2 146 435 072 DIM.73
01/06/2012  18:46     2 146 435 072 DIM.74
01/06/2012  18:46     2 146 435 072 DIM.75
01/06/2012  18:46     2 146 435 072 DIM.76
01/06/2012  18:46     2 146 435 072 DIM.77
01/06/2012  18:49     2 146 435 072 DIM.78
01/06/2012  18:49     2 146 435 072 DIM.79
23/03/2012  13:24     2 146 435 072 DIM.8
01/06/2012  18:49     2 146 435 072 DIM.80
01/06/2012  18:49     2 146 435 072 DIM.81
01/06/2012  18:49     2 146 435 072 DIM.82
01/06/2012  18:50     2 146 435 072 DIM.83
01/06/2012  18:50     2 146 435 072 DIM.84
01/06/2012  18:50     2 146 435 072 DIM.85
01/06/2012  18:50     2 146 435 072 DIM.86
01/06/2012  18:50     2 146 435 072 DIM.87
01/06/2012  18:50     2 146 435 072 DIM.88
01/06/2012  18:50     2 146 435 072 DIM.89
23/03/2012  13:48     2 146 435 072 DIM.9
01/06/2012  18:50     2 146 435 072 DIM.90
01/06/2012  18:51     2 146 435 072 DIM.91
01/06/2012  18:51     2 146 435 072 DIM.92
01/06/2012  18:51     2 146 435 072 DIM.93
01/06/2012  18:51     2 146 435 072 DIM.94
01/06/2012  18:51     2 146 435 072 DIM.95
01/06/2012  20:06     2 146 435 072 DIM.96
02/06/2012  19:08     2 146 435 072 DIM.97
02/06/2012  23:39     2 146 435 072 DIM.98
02/06/2012  19:08                 0 DIM.98.renamedbypat
02/06/2012  23:39     2 146 435 072 DIM.99
23/03/2012  10:47        16 777 216 DIM.ns
03/06/2012  11:39    <DIR>          journal
03/06/2012  11:38                 5 mongod.lock

Comment by Patrick Reyes [ 03/Jun/12 ]

Added Log file

Comment by Patrick Reyes [ 03/Jun/12 ]

It looks like each time the files are created there is an assertion failure>

DIM.PaperCopy Assertion failure approxSize < Extent::maxSize() db\pdfile.cpp 437

Not 100% sure however. I have added the log from 00h05 today Morning when mongo has created 9 new files (DIM.101 to DIM.109) because I don't have access to the log dated 1 june when all started before tomorrow. Since midnight it has created a total of 80 GB FILES and my disks will be full again before this evening.

Comment by Tad Marshall [ 03/Jun/12 ]

Can you please post:

1) The version of mongod.exe you are using;
2) The log for when mongod.exe filled your disk;
3) The output of a "dir" command on the data directory;
4) From the mongo.exe shell, the output of "show dbs";
5) From the mongo.exe shell, the output of 'db.getSiblingDB("<databaseName>").stats()' where <databaseName> is the largest database shown by "show dbs".

Thanks!

Generated at Thu Feb 08 03:10:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.