[SERVER-20726] Improve mongod behavior after failed msync() call Created: 01/Oct/15  Updated: 06/Dec/22  Resolved: 14/Sep/18

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: 3.0.5, 3.0.6
Fix Version/s: None

Type: Improvement Priority: Minor - P4
Reporter: Jeremy Sternad [X] Assignee: Backlog - Storage Execution Team
Resolution: Won't Fix Votes: 0
Labels: Bug
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Ubuntu 14.04, 2 GB RAM, 50 GB Disk Space

  • Hosted in a Hyper V VM

Assigned Teams:
Storage Execution
Participants:

 Description   

The requested improvement is to either not shut down or improve the error message.

Original description

the mongod service sometimes crashes with the following log:

2015-10-01T20:05:46.839+0200 I CONTROL  [DataFileSync] msync errno:5 Input/output error
2015-10-01T20:05:46.839+0200 I CONTROL  [DataFileSync] error syncing data to disk, probably a disk error
2015-10-01T20:05:46.839+0200 I CONTROL  [DataFileSync]  shutting down immediately to avoid corruption
2015-10-01T20:05:46.839+0200 I -        [DataFileSync] Fatal Assertion 17346
2015-10-01T20:05:46.845+0200 I CONTROL  [DataFileSync] 
 0xf5bfc9 0xefaea1 0xedfa91 0xefd1c9 0xf03473 0xefe472 0xefe5c7 0xd15ae5 0xee26d0 0xfa9c94 0x7f2cd5b82182 0x7f2cd464947d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5BFC9"},{"b":"400000","o":"AFAEA1"},{"b":"400000","o":"ADFA91"},{"b":"400000","o":"AFD1C9"},{"b":"400000","o":"B03473"},{"b":"400000","o":"AFE472"},{"b":"400000","o":"AFE5C7"},{"b":"400000","o":"915AE5"},{"b":"400000","o":"AE26D0"},{"b":"400000","o":"BA9C94"},{"b":"7F2CD5B7A000","o":"8182"},{"b":"7F2CD454F000","o":"FA47D"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "3.16.0-30-generic", "version" : "#40~14.04.1-Ubuntu SMP Thu Jan 15 17:43:14 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "BF5AC37B50D416FD8D6D427E561426ED60291032" }, { "b" : "7FFFE78DB000", "elfType" : 3, "buildId" : "C8BA9F3BA421CFBAE75F7E57F357B1B5431DE838" }, { "b" : "7F2CD5B7A000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7F2CD591B000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7F2CD5540000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7F2CD5338000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7F2CD5134000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7F2CD4E30000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7F2CD4B2A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7F2CD4914000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7F2CD454F000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7F2CD5D98000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5bfc9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefaea1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xedfa91]
 mongod(_ZN5mongo21dataSyncFailedHandlerEv+0xA9) [0xefd1c9]
 mongod(_ZN5mongo14PosixFlushable5flushEv+0x3A3) [0xf03473]
 mongod(_ZN5mongo9MongoFile9_flushAllEb+0x102) [0xefe472]
 mongod(_ZN5mongo9MongoFile8flushAllEb+0x27) [0xefe5c7]
 mongod(_ZN5mongo12DataFileSync3runEv+0x105) [0xd15ae5]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x120) [0xee26d0]
 mongod(+0xBA9C94) [0xfa9c94]
 libpthread.so.0(+0x8182) [0x7f2cd5b82182]
 libc.so.6(clone+0x6D) [0x7f2cd464947d]
-----  END BACKTRACE  -----
2015-10-01T20:05:46.845+0200 I -        [DataFileSync] 
 
***aborting after fassert() failure

This happens always after an uptime for about 3 to 4 days.
We don't really know ho to reproduce this behaviour.
We let the service team check the drive, but they did not find any errors.

I was searching the internet for a solution, but didn't find one. I found an older issue which is similiar to the above error message.
(Ticket was "Core Server / SERVER-12257 (improve mongod behavior on storage write errors during msync))



 Comments   
Comment by Ramon Fernandez Marina [ 18/Apr/16 ]

Repurposing this as an improvement request as per Mark's last comment.

Comment by Mark Callaghan [ 16/Apr/16 ]

Please reopen this as a feature request. The requested improvement is to either not shut down or improve the error message.

In MySQL, we encounter disk full for the binlog and InnoDB. They go into loops with: sleep, retry, print message to error log, repeat and this allows a DBA to correct the problem. But you might not be investing in mmap at this point.

The easier solution is to print something indicating that the disk is full. It is hard to notice that from the error message below.

LevelDB had similar fun with mmap:

2016-04-15T20:05:49.638-0700 I CONTROL  [DataFileSync] msync errno:5 Input/output error
2016-04-15T20:05:49.642-0700 I CONTROL  [DataFileSync] error syncing data to disk, probably a disk error
2016-04-15T20:05:49.642-0700 I CONTROL  [DataFileSync]  shutting down immediately to avoid corruption
2016-04-15T20:05:49.648-0700 I -        [DataFileSync] Fatal Assertion 17346
2016-04-15T20:05:49.652-0700 I -        [DataFileSync] 
 
***aborting after fassert() failure
 
2016-04-15T20:05:49.925-0700 F -        [DataFileSync] Got signal: 6 (Aborted).
 
 0x1389392 0x13884d9 0x1388cf2 0x7f02e9411340 0x7f02e9072cc9 0x7f02e90760d8 0x1322dc2 0x10ca24a 0x10d12cd 0x10cc0d2 0x109cabc 0x1327720 0x7f02e9beca60 0x7f02e9409182 0x7f02e913647d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F89392","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F884D9"},{"b":"400000","o":"F88CF2"},{"b":"7F02E9401000","o":"10340"},{"b":"7F02E903C000","o":"36CC9","s":"gsignal"},{"b":"7F02E903C000","o":"3A0D8","s":"abort"},{"b":"400000","o":"F22DC2","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CCA24A","s":"_ZN5mongo21dataSyncFailedHandlerEv"},{"b":"400000","o":"CD12CD","s":"_ZN5mongo14PosixFlushable5flushEv"},{"b":"400000","o":"CCC0D2","s":"_ZN5mongo9MongoFile9_flushAllEb"},{"b":"400000","o":"C9CABC","s":"_ZN5mongo12DataFileSync3runEv"},{"b":"400000","o":"F27720","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"7F02E9B3B000","o":"B1A60"},{"b":"7F02E9401000","o":"8182"},{"b":"7F02E903C000","o":"FA47D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.4", "gitVersion" : "e2ee9ffcf9f5a94fad76802e28cc978718bb7a30", "compiledModules" : [ "rocks" ], "uname" : { "sysname" : "Linux", "release" : "3.16.0-67-generic", "version" : "#87~14.04.1-Ubuntu SMP Fri Mar 11 00:26:02 UTC 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "F33878CD3EEF9C4D7924DF8E3DCC07E069A9830F" }, { "b" : "7FFF4E754000", "elfType" : 3, "buildId" : "A26B644BA2DB522109E62CDBBE861E14C55D634B" }, { "b" : "7F02EA454000", "path" : "/lib/x86_64-linux-gnu/libbz2.so.1.0", "elfType" : 3, "buildId" : "E1031DDBFFE20367E874B7093EEC0C8D9F3B43F6" }, { "b" : "7F02EA24B000", "path" : "/usr/lib/x86_64-linux-gnu/liblz4.so.1", "elfType" : 3, "buildId" : "A4CA96C71C83286E315E29063DCA275230246522" }, { "b" : "7F02EA043000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "B376100CAB1EAC4E5DE066EACFC282BF7C0B54F3" }, { "b" : "7F02E9E3F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "67699FFDA9FD2A552032E0652A242E82D65AA10D" }, { "b" : "7F02E9B3B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "D0E735DBECD63462DA114BD3F76E6EC7BB1FACCC" }, { "b" : "7F02E9835000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "EF3F6DFFA1FBE48436EC6F45CD3AABA157064BB4" }, { "b" : "7F02E961F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "36311B4457710AE5578C4BF00791DED7359DBB92" }, { "b" : "7F02E9401000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "AF06068681750736E0524DF17D5A86CB2C3F765C" }, { "b" : "7F02E903C000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "5382058B69031CAA9B9996C11061CD164C9398FF" }, { "b" : "7F02EA664000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2A816C3EBBA4E12813FBD34B06FBD25BC892A67F" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1389392]
 mongod(+0xF884D9) [0x13884d9]
 mongod(+0xF88CF2) [0x1388cf2]
 libpthread.so.0(+0x10340) [0x7f02e9411340]
 libc.so.6(gsignal+0x39) [0x7f02e9072cc9]
 libc.so.6(abort+0x148) [0x7f02e90760d8]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x1322dc2]
 mongod(_ZN5mongo21dataSyncFailedHandlerEv+0xEA) [0x10ca24a]
 mongod(_ZN5mongo14PosixFlushable5flushEv+0x2ED) [0x10d12cd]
 mongod(_ZN5mongo9MongoFile9_flushAllEb+0x252) [0x10cc0d2]
 mongod(_ZN5mongo12DataFileSync3runEv+0x36C) [0x109cabc]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x160) [0x1327720]
 libstdc++.so.6(+0xB1A60) [0x7f02e9beca60]

Comment by Ramon Fernandez Marina [ 01/Oct/15 ]

Jayrizon, as per these messages in the log:

2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] msync errno:5 Input/output error
2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] error syncing data to disk, probably a disk error
2015-10-01T20:05:46.839+0200 I CONTROL [DataFileSync] shutting down immediately to avoid corruption

mongod was not able to successfully complete a msync(2) call and shut itself down to safeguard your data. This error is triggered by a problem in the storage layer, so please search your system logs for further information on what's causing this issue. It could be a problem in the virtualization layer as well.

This behavior is not a bug in the server, so I'm going to close this ticket. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience.

Regards,
Ramón.

Generated at Thu Feb 08 03:55:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.