Critical - P2
Windows 2008 64 bits R2
Virtualized over VMWare, 4 virtual CPUs (2 real on host), Intel Xeon E5520 2.27GHz
6GB of RAM, 8GB of page file
virtual disk with 20GB of space (lots available at time of crash)
I'm running a test, based on gridfs_test.c test_large().
I modified this test to operate with a 5GB file instead of the standard 3GB file.
I have also modified the source code of GridFS.c to allow for random writing into a file (I will commit this codes on my fork and request a Pull). The most significant change is that it now does an Update (upsert) of the data, in case the file already existed if overwriting a block.
If I run the test once with my configuration, which will create the file twice on the DB with a deletion in between, when is trying to create the file on the second part after deleting the first file it will crash. This happens right away, no waits or nothing.
I know it's not something functional with the test, because if I bring MongoD up again and run the test again over the same database, it will work.
The Commit Size seams to grow unstoppably until it hits about 5.4GB and it hovers there. I have limited Working set consumption to 1GB for the sake of stressing the server a little more. I tried also removing this constraint letting mongo consume everything he wants with the same result, so I see no reason to let it eat all of my RAM. (I used this to limit Mongo RAM http://captaincodeman.com/2011/02/27/limit-mongodb-memory-use-windows/)
I have attached MongoDConsole.JPG. It shows the warning before the server crashes.
I did put a breakpoint on the place where I'm writing to the DB to let it "sit" for a while, and then let it run again.
The warning appeared again, as soon as it appeared I put a breakpoint again at the insertion point and let it sit for a while.
I run some other tests I have, light weight one and saw journal files moving getting cleaned up.
I removed the breakpoint, and let the code run again and it finished this time.
See attachment MongoDConsole_3.jpg. That's after another run. The previous run succeeded, so I started the test again. Got the warning more or less right away, and right after that the Journal processor got rid of a lot of journal files, it seems things stabilized there.
Something else to mention is that I'm removing the old "File" every time I run the tests, which implies all chunks have to be removed and putting some load on the process.
On this run. I got the warning again, and I'm letting the system run to see if I can get it to crash.
Got the warning a second time on the final run, but didn't crash (MongoDConsole_4JPG.jpg).
This test, after the interventions finished fine.
Launched the test again, but this time decided not to intervene and let the program run even if getting the warning. Got it right away after the process started, and a little while after got the kiss of death. Warning and crash (see MongoDConsole_crash.jpg).
This definitively seems to be a problem with rate of IO.
I do think MongoD should be smart enough to "throttle" the client who is pushing data, as the client has no way of knowing this is going on at the MongoD side.