[SERVER-3013] New DB file created every few seconds Created: 28/Apr/11 Updated: 29/May/12 Resolved: 02/Sep/11 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 1.8.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Martin Chlupac | Assignee: | Dwight Merriman |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
Windows Server 2008, 8GB RAM, DB size: 240GB, number of collections: 2000, number of documents: 1 000 000 000 |
||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Description |
|
We started inserting new data into more or less random collections but the lock ratio was very high. (>100% with just 60 inserts/s) After restart of mongo we started with inserts again and after few minutes the following error occurred in the log and mongo started allocating new db files like crazy. I managed to stop it before it filled the whole disk but it allocated some 50 new files. The process was stopped correctly. log: Thu Apr 28 17:15:05 [conn4] insert rcware.values330 140ms |
| Comments |
| Comment by Martin Chlupac [ 02/Sep/11 ] |
|
We actually stopped testing on windows because it was taking too much time and had to move back to Postgresql. I cannot provide you any more information than what was said till now, sorry. |
| Comment by Eliot Horowitz (Inactive) [ 02/Sep/11 ] |
|
Please let us know if this is still happening |
| Comment by Martin Chlupac [ 11/May/11 ] |
|
I will turn off the journaling and try again... but it takes some time to restore the data. Interesting thing is that when I am importing data - the number of inserts per second is quite ok (limited by network speed) ~ several thousands records per second (but they all go into one collection). When I switch to regular data traffic, which consists mostly of "random" inserts/updates to "random" collections - the speed drops to numbers that you see in the log. |
| Comment by Martin Chlupac [ 11/May/11 ] |
|
See the last attachment - there is a complete mongo log and mongostat log. There are also some screenshots which seem to show that it is not memory allocation problem though... I have not recorded the journal file size but it was after fresh start and there were only few hundreds or thousands of inserts at most so I would expect it rather small. |
| Comment by Testo [ 11/May/11 ] |
|
Okay. Now we know that it can only allocate maximum 27.4 GB for some reason and MongoDB might have gone over that. Did you manage to capture db.serverStatus or mongostat before it starts throwing error? The total size of journal files would be helpful too. |
| Comment by Martin Chlupac [ 11/May/11 ] |
|
You did not specify what exactly should I test so picked two tests that seemed to make sense: C:\Programs\testLimit>testlimit64.exe -r Testlimit v5.04 - test Windows limits Reserving private bytes (MB)... C:\Programs\testLimit>testlimit64.exe -m 100 Testlimit v5.04 - test Windows limits Leaking private bytes 100 MB at a time... I also attached a complete mongo log, mongostat log and screenshots from this test. > db.stats() |
| Comment by Martin Chlupac [ 06/May/11 ] |
|
16GB is preallocated by OS by default and it depends on the size of RAM and number of disks.... It takes few hours to restore everything from backups and replicate the error.. anything else you would like to see when I am doing this? |
| Comment by Martin Chlupac [ 04/May/11 ] |
|
The paging file is managed by OS so it can grow freely. OS reports that it has allocated some 16GB right now but it's also default size so I would not say it is a swap size issue. What I have not said is that the server is master in master-slave pair and the same problem is replicated to the slave but without the error messages. The slave is real hw. |
| Comment by Dwight Merriman [ 04/May/11 ] |
|
I suspect that mongod is using too much page file reservation currently. I will investigate. In the meantime making your paging file bigger may mitigate. How big is it currently configured for? |
| Comment by Testo [ 02/May/11 ] |
|
It will help find out the problem if you can attach your os detail, db.stats() and mongostat when the problem starts happening. File mapping error is a bit hard to diagnose as the behavior is quite vary depending on OS and its configuration Sent via PocketJIRA - www.pocketjira.com |
| Comment by Martin Chlupac [ 30/Apr/11 ] |
|
swap is managed by OS so it does not have fixed size. I will add page files to all drives so there will be hundreds of GB just for swap... There was very high lock ratio before the issue occurred in the log - could that be related to small RAM? |
| Comment by Testo [ 30/Apr/11 ] |
|
If you have only 4gb available on the os disk, it's very likely that you run out of swap space. For each file mapping, windows needs to maintain mapping table in memory. It can use a lot of memory especially when the data size is several hundred GBs. Can you free up disk space and make sure the swap setting is set to dynamic instead of fixed size Sent via PocketJIRA - www.pocketjira.com |
| Comment by Martin Chlupac [ 29/Apr/11 ] |
|
Swap is managed by OS and it should be limited only by the free space on the disk. OS reports that there are currently 8GB allocated for page file and there are still some 4GB lefts on that disk. DB is on a different disk. DB was at about 240GB when the log I posted started. How is swap involved in mongo memory management anyway? |
| Comment by Testo [ 29/Apr/11 ] |
|
Sounds like you run out of swap space. How much swap did you set? How big was the database before it died? |
| Comment by Eliot Horowitz (Inactive) [ 29/Apr/11 ] |
|
Can you send more of the log (ideally the whole thing)? |