[SERVER-6924] In RunTime model, when disk full or disk damaged MongoD crashes Created: 04/Sep/12  Updated: 15/Jan/15  Resolved: 04/Sep/12

Status: Closed
Project: Core Server
Component/s: Storage
Affects Version/s: 2.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Xuguang zhan Assignee: Eric Milkie
Resolution: Done Votes: 0
Labels: crash
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Mac OS X Volume (Lion - 10.7.4) on a Mac OS Extended (Journaled).


Attachments: PNG File mongoD_log.PNG    
Issue Links:
Depends
Duplicate
is duplicated by SERVER-9350 Fatal assertion 13515 on disk full Closed
Related
Operating System: ALL
Participants:

 Description   

Test1 the restart when can't Write data in Disk

Step1. Write the hook.so for Write and Export to the system
Step2. Restart the MongoDB
Result: Mongo DB will terminating, and check the src code when MongoDB write PID file failure ,below is the failure code fragment and log of MongoD. attachment you can check

So the question is what's the exception code return when insert data into Disk failure and application call ,and in the runTime model,insert data failure ,what's result on MongoD

Test2

After started mongod and inserting a document to allocate the space, I filled the volume with "cat /dev/zero > /Volume/test/bigfile".
I inserted another document and then it crashed.

Here is the stack trace:

Mon Sep 3 17:34:42 [journal] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes; b=0x111000000 errno:28 No space left on device
Mon Sep 3 17:34:42 [journal] Fatal Assertion 13515
0x10037637b 0x1000aeeb5 0x1002319c6 0x100107471 0x1001076e2 0x10023eddb 0x100240090 0x1002426ed 0x1005a7823 0x7fff8e3568bf 0x7fff8e359b75
0 mongod 0x000000010037637b _ZN5mongo15printStackTraceERSo + 43
1 mongod 0x00000001000aeeb5 _ZN5mongo13fassertFailedEi + 165
2 mongod 0x00000001002319c6 _ZN5mongo7LogFile17synchronousAppendEPKvm + 326
3 mongod 0x0000000100107471 _ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE + 769
4 mongod 0x00000001001076e2 _ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE + 50
5 mongod 0x000000010023eddb _ZN5mongo3dur27groupCommitWithLimitedLocksEv + 411
6 mongod 0x0000000100240090 _ZN5mongo3durL20durThreadGroupCommitEv + 208
7 mongod 0x00000001002426ed _ZN5mongo3dur9durThreadEv + 1117
8 mongod 0x00000001005a7823 thread_proxy + 163
9 libsystem_c.dylib 0x00007fff8e3568bf _pthread_start + 335
10 libsystem_c.dylib 0x00007fff8e359b75 thread_start + 13
Mon Sep 3 17:34:42 [journal]

***aborting after fassert() failure

Mon Sep 3 17:34:42 Got signal: 6 (Abort trap: 6).

Mon Sep 3 17:34:42 Backtrace:
0x10037637b 0x100001a6b 0x7fff8e3aacfa 0x17fffffff 0x7fff8e349a7a 0x1000aeef0 0x1002319c6 0x100107471 0x1001076e2 0x10023eddb 0x100240090 0x1002426ed 0x1005a7823 0x7fff8e3568bf 0x7fff8e359b75
0 mongod 0x000000010037637b _ZN5mongo15printStackTraceERSo + 43
1 mongod 0x0000000100001a6b _ZN5mongo10abruptQuitEi + 987
2 libsystem_c.dylib 0x00007fff8e3aacfa _sigtramp + 26
3 ??? 0x000000017fffffff 0x0 + 6442450943
4 libsystem_c.dylib 0x00007fff8e349a7a abort + 143
5 mongod 0x00000001000aeef0 _ZN5mongo13fassertFailedEi + 224
6 mongod 0x00000001002319c6 _ZN5mongo7LogFile17synchronousAppendEPKvm + 326
7 mongod 0x0000000100107471 _ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE + 769
8 mongod 0x00000001001076e2 _ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE + 50
9 mongod 0x000000010023eddb _ZN5mongo3dur27groupCommitWithLimitedLocksEv + 411
10 mongod 0x0000000100240090 _ZN5mongo3durL20durThreadGroupCommitEv + 208
11 mongod 0x00000001002426ed _ZN5mongo3dur9durThreadEv + 1117
12 mongod 0x00000001005a7823 thread_proxy + 163
13 libsystem_c.dylib 0x00007fff8e3568bf _pthread_start + 335
14 libsystem_c.dylib 0x00007fff8e359b75 thread_start + 13

db.getLastError(WriteConcern.FSYNC_SAFE)



 Comments   
Comment by Xuguang zhan [ 05/Sep/12 ]

Thanks Team, I get the info, and see you have update them to the wiki http://www.mongodb.org/display/DOCS/Excessive+Disk+Space#ExcessiveDiskSpace-Runningoutofdiskspace. but no matter what I think Crush it is not allowed, may be you should think you Design, make it more perfect

Comment by Gianfranco Palumbo [ 04/Sep/12 ]

Perfect got it now. Thanks Eric

Comment by Eric Milkie [ 04/Sep/12 ]

If you separate the data from the journal (see http://www.mongodb.org/display/DOCS/Journaling#Journaling-ThejournalSubdirectory ), then if you fill up the data disk, the server should stay up. You cannot fill up the journal disk, however, or the server will abort.

Comment by Eric Milkie [ 04/Sep/12 ]

Yes, it's because you are using journaling (turned on by default). The journal gets written before the data files. I changed the documentation on the Wiki to be more helpful.

Comment by Gianfranco Palumbo [ 04/Sep/12 ]

Actually I did the Test2 on my machine yesterday

I started mongod and inserted a document to allocate the space, I filled the volume with "cat /dev/zero > /Volume/test/bigfile".
And then after inserted another document which made mongod crash.

Comment by Eric Milkie [ 04/Sep/12 ]

That is correct; the server will keep running if your data disk fills up. The examples you provided were if the disk was already full when you started the server (lock file can't be written), and when your journaling disk fills up. The server cannot stay up if you fill up the disk where your journal lives. I will add something to the Wiki explaining this.

Comment by Eric Milkie [ 04/Sep/12 ]

The behavior you describe is by design. If the disk is too full to write the server lock file or the journal, the server must exit.
Depending on what stage of the write the disk runs out of space, a client may see a socket disconnect or it may see some other exception returned. This is the same behavior as any other issue that causes the server process to abort.

Generated at Thu Feb 08 03:13:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.