The requested improvement is to either not shut down or improve the error message.
Original description
the mongod service sometimes crashes with the following log:
2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] msync errno:5 Input/output error 2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] error syncing data to disk, probably a disk error 2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] shutting down immediately to avoid corruption 2015-10-01T20:05:46.839+0200 I - [DataFileSync] Fatal Assertion 17346 2015-10-01T20:05:46.845+0200 I CONTROL [DataFileSync] 0xf5bfc9 0xefaea1 0xedfa91 0xefd1c9 0xf03473 0xefe472 0xefe5c7 0xd15ae5 0xee26d0 0xfa9c94 0x7f2cd5b82182 0x7f2cd464947d ----- BEGIN BACKTRACE ----- {"backtrace":[{"b":"400000","o":"B5BFC9"},{"b":"400000","o":"AFAEA1"},{"b":"400000","o":"ADFA91"},{"b":"400000","o":"AFD1C9"},{"b":"400000","o":"B03473"},{"b":"400000","o":"AFE472"},{"b":"400000","o":"AFE5C7"},{"b":"400000","o":"915AE5"},{"b":"400000","o":"AE26D0"},{"b":"400000","o":"BA9C94"},{"b":"7F2CD5B7A000","o":"8182"},{"b":"7F2CD454F000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "3.16.0-30-generic", "version" : "#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "BF5AC37B50D416FD8D6D427E561426ED60291032" }, { "b" : "7FFFE78DB000", "elfType" : 3, "buildId" : "C8BA9F3BA421CFBAE75F7E57F357B1B5431DE838" }, { "b" : "7F2CD5B7A000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F2CD591B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7F2CD5540000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7F2CD5338000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F2CD5134000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F2CD4E30000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7F2CD4B2A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F2CD4914000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F2CD454F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F2CD5D98000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }} mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5bfc9] mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefaea1] mongod(_ZN5mongo13fassertFailedEi+0x61) [0xedfa91] mongod(_ZN5mongo21dataSyncFailedHandlerEv+0xA9) [0xefd1c9] mongod(_ZN5mongo14PosixFlushable5flushEv+0x3A3) [0xf03473] mongod(_ZN5mongo9MongoFile9_flushAllEb+0x102) [0xefe472] mongod(_ZN5mongo9MongoFile8flushAllEb+0x27) [0xefe5c7] mongod(_ZN5mongo12DataFileSync3runEv+0x105) [0xd15ae5] mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x120) [0xee26d0] mongod(+0xBA9C94) [0xfa9c94] libpthread.so.0(+0x8182) [0x7f2cd5b82182] libc.so.6(clone+0x6D) [0x7f2cd464947d] ----- END BACKTRACE ----- 2015-10-01T20:05:46.845+0200 I - [DataFileSync] ***aborting after fassert() failure
This happens always after an uptime for about 3 to 4 days.
We don't really know ho to reproduce this behaviour.
We let the service team check the drive, but they did not find any errors.
I was searching the internet for a solution, but didn't find one. I found an older issue which is similiar to the above error message.
(Ticket was "Core Server / SERVER-12257 (improve mongod behavior on storage write errors during msync))