[SERVER-9439] Fatal Assertion 13515 - errno:5 Input/output error Created: 23/Apr/13  Updated: 10/Dec/14  Resolved: 24/Apr/13

Status: Closed
Project: Core Server
Component/s: Stability
Affects Version/s: 2.4.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: György Nagy Assignee: Andy Schwerin
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49


Operating System: Linux
Participants:

 Description   

One of our production server crashed today. Stacktrace from the logfile included.
Its not a heavy-used server(around50-100 query/sec), 84% free sick, 1 Gb ram.

Tue Apr 23 10:30:48.125 [journal] LogFile::synchronousAppend failed with 8192 bytes unwritten out of 8192 bytes;  b=0x7f652d3d8000 errno:5 Input/output error
Tue Apr 23 10:30:48.125 [journal]   Fatal Assertion 13515
0xdc7f71 0xd87cf3 0xda410f 0x91f6f5 0x91f922 0x914b11 0x916759 0x916b1b 0xe10879 0x7f65b86d1e9a 0x7f65b79e4cbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xa3) [0xd87cf3]
 /usr/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x14f) [0xda410f]
 /usr/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e5) [0x91f6f5]
 /usr/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0x91f922]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x141) [0x914b11]
 /usr/bin/mongod() [0x916759]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x2eb) [0x916b1b]
 /usr/bin/mongod() [0xe10879]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f65b86d1e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f65b79e4cbd]
Tue Apr 23 10:30:48.139 [journal]
 
***aborting after fassert() failure
 
 
Tue Apr 23 10:30:48.144 Got signal: 6 (Aborted).
 
Tue Apr 23 10:30:48.245 Backtrace:
0xdc7f71 0x6ce459 0x7f65b79274a0 0x7f65b7927425 0x7f65b792ab8b 0xd87d2e 0xda410f 0x91f6f5 0x91f922 0x914b11 0x916759 0x916b1b 0xe10879 0x7f65b86d1e9a 0x7f65b79e4cbd
 /usr/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /usr/bin/mongod(_ZN5mongo10abruptQuitEi+0x399) [0x6ce459]
 /lib/x86_64-linux-gnu/libc.so.6(+0x364a0) [0x7f65b79274a0]
 /lib/x86_64-linux-gnu/libc.so.6(gsignal+0x35) [0x7f65b7927425]
 /lib/x86_64-linux-gnu/libc.so.6(abort+0x17b) [0x7f65b792ab8b]
 /usr/bin/mongod(_ZN5mongo13fassertFailedEi+0xde) [0xd87d2e]
 /usr/bin/mongod(_ZN5mongo7LogFile17synchronousAppendEPKvm+0x14f) [0xda410f]
 /usr/bin/mongod(_ZN5mongo3dur7Journal7journalERKNS0_11JSectHeaderERKNS_14AlignedBuilderE+0x1e5) [0x91f6f5]
 /usr/bin/mongod(_ZN5mongo3dur14WRITETOJOURNALENS0_11JSectHeaderERNS_14AlignedBuilderE+0x32) [0x91f922]
 /usr/bin/mongod(_ZN5mongo3dur27groupCommitWithLimitedLocksEv+0x141) [0x914b11]
 /usr/bin/mongod() [0x916759]
 /usr/bin/mongod(_ZN5mongo3dur9durThreadEv+0x2eb) [0x916b1b]
 /usr/bin/mongod() [0xe10879]
 /lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a) [0x7f65b86d1e9a]
 /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f65b79e4cbd]



 Comments   
Comment by Andy Schwerin [ 24/Apr/13 ]

Kernel logs indicate underlying disk subsystem as the culprit.

Comment by György Nagy [ 24/Apr/13 ]

It's a server in the cloud at elastichosts.com. I have no /var/log/messages, but i found some relevant message in the kern.log:

Apr 23 10:30:48 db1 kernel: [5541065.323006] sd 0:0:0:0: [sda] Unhandled error code
Apr 23 10:30:48 db1 kernel: [5541065.323006] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Apr 23 10:30:48 db1 kernel: [5541065.323006] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 48 f2 50 00 00 10 00
Apr 23 10:30:48 db1 kernel: [5541065.323006] end_request: I/O error, dev sda, sector 4780624
Apr 23 10:30:48 db1 kernel: [5541065.541756] init: mongodb main process (15018) terminated with status 14

It's a little frightening that there is a log entry for Apr 19 too:

Apr 19 11:18:36 db1 kernel: [5198333.336445] sd 0:0:0:0: [sda] Unhandled error code
Apr 19 11:18:36 db1 kernel: [5198333.336453] sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Apr 19 11:18:36 db1 kernel: [5198333.336459] sd 0:0:0:0: [sda] CDB: Write(10): 2a 00 00 7c ad d8 00 00 10 00
Apr 19 11:18:36 db1 kernel: [5198333.336471] end_request: I/O error, dev sda, sector 8170968
Apr 19 11:18:36 db1 kernel: [5198333.420202] Buffer I/O error on device sda, logical block 1021371
Apr 19 11:18:36 db1 kernel: [5198333.420462] Buffer I/O error on device sda, logical block 1021372
Apr 19 11:18:36 db1 kernel: [5198333.420694] EXT4-fs warning (device sda): ext4_end_bio:251: I/O error writing to inode 220830 (offset 0 size 8192 starting block 1021373)

I promise, next time i will check the logs first.

Comment by Andy Schwerin [ 23/Apr/13 ]

This indicates a possible problem with the underlying file system or disk, which prevented Mongo from committing a write. Errno 5 is EIO, which normally indicates bad blocks, transient or permanent file system errors. It is not typically used to indicate lack of free space, but I wouldn't rule it out, either. If this is an EBS or other network backed instance, it could also indicate network problems.

Do the OS system logs (dmesg, /var/log/messages) indicate disk or I/O issues?

Generated at Thu Feb 08 03:20:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.