[SERVER-6292] MongoD server crashes after inserted about 10GB worth of data with "Assertion failure a <= 512*1024*1024 util/alignedbuilder.cpp" Created: 03/Jul/12  Updated: 08/Mar/13  Resolved: 27/Nov/12

Status: Closed
Project: Core Server
Component/s: GridFS, Internal Code, Usability
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Jose Sebastian Battig Assignee: Mathias Stearn
Resolution: Incomplete Votes: 0
Labels: SERVER_V2, Windows, crash, insert, performance, update
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Windows 2008 64 bits R2
Virtualized over VMWare, 4 virtual CPUs (2 real on host), Intel Xeon E5520 2.27GHz
6GB of RAM, 8GB of page file
virtual disk with 20GB of space (lots available at time of crash)


Attachments: JPEG File MongoDConsole.JPG     JPEG File MongoDConsole_2.JPG     JPEG File MongoDConsole_3.JPG     JPEG File MongoDConsole_4JPG.JPG     JPEG File MongoDConsole_Crash.JPG     File gridfs_test.c    
Operating System: Windows
Participants:

 Description   

I'm running a test, based on gridfs_test.c test_large().
I modified this test to operate with a 5GB file instead of the standard 3GB file.
I have also modified the source code of GridFS.c to allow for random writing into a file (I will commit this codes on my fork and request a Pull). The most significant change is that it now does an Update (upsert) of the data, in case the file already existed if overwriting a block.

If I run the test once with my configuration, which will create the file twice on the DB with a deletion in between, when is trying to create the file on the second part after deleting the first file it will crash. This happens right away, no waits or nothing.
I know it's not something functional with the test, because if I bring MongoD up again and run the test again over the same database, it will work.
The Commit Size seams to grow unstoppably until it hits about 5.4GB and it hovers there. I have limited Working set consumption to 1GB for the sake of stressing the server a little more. I tried also removing this constraint letting mongo consume everything he wants with the same result, so I see no reason to let it eat all of my RAM. (I used this to limit Mongo RAM http://captaincodeman.com/2011/02/27/limit-mongodb-memory-use-windows/)

I have attached MongoDConsole.JPG. It shows the warning before the server crashes.
I did put a breakpoint on the place where I'm writing to the DB to let it "sit" for a while, and then let it run again.
The warning appeared again, as soon as it appeared I put a breakpoint again at the insertion point and let it sit for a while.
I run some other tests I have, light weight one and saw journal files moving getting cleaned up.
I removed the breakpoint, and let the code run again and it finished this time.

See attachment MongoDConsole_3.jpg. That's after another run. The previous run succeeded, so I started the test again. Got the warning more or less right away, and right after that the Journal processor got rid of a lot of journal files, it seems things stabilized there.
Something else to mention is that I'm removing the old "File" every time I run the tests, which implies all chunks have to be removed and putting some load on the process.

On this run. I got the warning again, and I'm letting the system run to see if I can get it to crash.
Got the warning a second time on the final run, but didn't crash (MongoDConsole_4JPG.jpg).
This test, after the interventions finished fine.

Launched the test again, but this time decided not to intervene and let the program run even if getting the warning. Got it right away after the process started, and a little while after got the kiss of death. Warning and crash (see MongoDConsole_crash.jpg).

This definitively seems to be a problem with rate of IO.

I do think MongoD should be smart enough to "throttle" the client who is pushing data, as the client has no way of knowing this is going on at the MongoD side.



 Comments   
Comment by Mathias Stearn [ 05/Jul/12 ]

FYI, GridFS intentionally doesn't support modifying files in-place, so if you plan on doing that I'd advise against using the standard fs.files/fs.chunks as many tools will not be expecting it. That said, this should definitely not cause the error you are seeing.

In order to narrow down the cause here, do you know which of the four test_* functions is causing the crash?

Comment by Jose Sebastian Battig [ 03/Jul/12 ]

I did a test now launching mongod with option --journalCommitInterval 5 and then using WriteConcern with j = 1, and the performance is more acceptable.
It looks like when launching mongod with default settings, every I/O operation launched by the driver has to wait until mongo passes thru the data cached in memory to be written to the journal, therefore making any I/O operation take at least 100ms.
Using this setting at least mongo is trying to flush the data more often, making every I/O with default write concern more acceptble

Comment by Jose Sebastian Battig [ 03/Jul/12 ]

I added the following code as soon as I connect to the server:

mongo_write_concern_init(&wc);
wc.j = 1;
mongo_write_concern_finish(&wc);
mongo_set_write_concern(conn, &wc);

And then the problem doesn't appear anymore. The problem is that general operation of the server seems really slow...
It will be nice if Mongo server did the throttling automatically as suggested on the issue report without the need to use writeconcerns

Generated at Thu Feb 08 03:11:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.