[SERVER-9350] Fatal assertion 13515 on disk full Created: 15/Apr/13  Updated: 15/Jan/15  Resolved: 24/Apr/13

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: David Gubler Assignee: Stennie Steneker (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Configuration:
Replica set with 5 members, crashed node was a secondary with delayed replication (1h lag). 3 members (incl. primary) run 2.2.3, this node and another secondary run 2.4.1.

Debian Linux, package from official source:

  1. dpkg -l | grep mongodb
    ii mongodb-10gen 2.4.1 amd64 An object/document-oriented database
  1. uname -a
    Linux tardis 3.2.0-3-amd64 #1 SMP Mon Jul 23 02:45:17 UTC 2012 x86_64 GNU/Linux

~# df
Filesystem 1K-blocks Used Available Use% Mounted on
rootfs 314198800 298470360 0 100% /
udev 10240 0 10240 0% /dev
tmpfs 6585076 72 6585004 1% /run
/dev/xvda1 314198800 298470360 0 100% /
tmpfs 5120 0 5120 0% /run/lock
tmpfs 13170140 0 13170140 0% /run/shm


Issue Links:
Duplicate
duplicates SERVER-6924 In RunTime model, when disk full or d... Closed
Operating System: Linux
Steps To Reproduce:

I have not experienced the disk full situation on any other MongoDB servers, thus I cannot tell if this problem only happens in this very specific situation or in general. Since this is a production deployment I don't want to do this sort of experiment.

Participants:

 Description   

MongoDB crashed when the disk got full. No suspcious log entries before the one below. dmesg also looks fine. I found several other bug reports relating to full disks, but as far as I can tell they don't apply to the current version.

Sun Apr 14 16:25:33.111 [journal] LogFile::synchronousAppend failed with 24576 bytes unwritten out of 24576 bytes; b=0x7fe7703ee000 errno:28 No space left on device
Sun Apr 14 16:25:33.112 [journal] Fatal Assertion 13515
0xdc7f71 0xd87cf3 0xda410f 0x91f6f5 0x91f922 0x9162eb 0x9166ec 0x916b1b 0xe10879 0x7fecc2687b50 0x7fecc1a2aa7d
/usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
/usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd87cf3]
/usr/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x14f) [0xda410f]
/usr/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e5) [0x91f6f5]
/usr/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0x91f922]
/usr/bin/mongod() [0x9162eb]
/usr/bin/mongod() [0x9166ec]
/usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x2eb) [0x916b1b]
/usr/bin/mongod() [0xe10879]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x6b50) [0x7fecc2687b50]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fecc1a2aa7d]



 Comments   
Comment by Stennie Steneker (Inactive) [ 24/Apr/13 ]

Hi David,

No bother at all .. we appreciate any feedback.

Output with a stacktrace is indeed somewhat scary, but this gives us more information on the specific code path that triggered an exception.

There is certainly room for improvement and making some of the messaging friendlier. In some common exceptions (such as a secondary too stale to sync or various startup warnings) we have added links to the online documentation. I think this is another case that would benefit from further explanation since the shutdown is expected behaviour. I will check if there is already a feature request for same, and raise one if not.

Regards,
Stephen

Comment by David Gubler [ 24/Apr/13 ]

Thanks, Stephen, and sorry for the bother. Unfortunately the output does not look like it is on purpose... I didn't even dare to restart it without removing all the data (and re-syncing to the replica set) because I was afraid of data corruption.

Comment by Stennie Steneker (Inactive) [ 24/Apr/13 ]

Hi David,

This behaviour is by design: if your server runs out of disk space for journal files, the server process will exit.

For more details please see the linked duplicate issue SERVER-6924 and the FAQ: How do I know when the server runs out of disk space?.

Thanks,
Stephen

Comment by David Gubler [ 15/Apr/13 ]

I forgot. This machine is a XEN VM, the disk image is an LVM volume (hence the /dev/xvda device name) on an SSD, the file system is EXT4 mounted with the discard option.

/dev/xvda1 / ext4 rw,noatime,errors=remount-ro,user_xattr,acl,barrier=1,data=ordered,discard 0 0

Generated at Thu Feb 08 03:20:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.