[SERVER-17388] Invariant failure in MMAPv1 when disk full Created: 26/Feb/15  Updated: 19/Sep/15  Resolved: 27/Feb/15

Status: Closed
Project: Core Server
Component/s: MMAPv1
Affects Version/s: 3.0.0-rc11
Fix Version/s: 3.0.1, 3.1.0

Type: Bug Priority: Major - P3
Reporter: Kevin Pulo Assignee: Kaloian Manassiev
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongod-2.8.0-rc5-v-diskfull.log     File mongod-2.8.0-rc5-vv-diskfull.log    
Issue Links:
Depends
Related
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Completed:
Steps To Reproduce:

$ mongo --port 11011
MongoDB shell version: 2.6.8
connecting to: 127.0.0.1:11011/test
> genstr
function (len, chr) {
        assert.eq(typeof(len), "number");
        assert.gte(len, 0);
        len = Math.floor(len);
        if ( ! (typeof(chr) == "string" && chr.length == 1) ) {
                chr = "x";
        }
        return (new Array(len+1)).join(chr);
}
> for(i=0;i<500;i++)db.foo.insert({a:genstr(15*1048576)})
WriteResult({
        "nInserted" : 0,
        "writeError" : {
                "code" : 12520,
                "errmsg" : "new file allocation failure"
        }
})
> for(i=0;i<500;i++)db.foo.insert({a:genstr(1*1048576)})
2015-02-26T16:43:15.849+1100 DBClientCursor::init call() failed
2015-02-26T16:43:15.850+1100 Error: error doing query: failed at src/mongo/shell/collection.js:258
2015-02-26T16:43:15.852+1100 trying reconnect to 127.0.0.1:11011 (127.0.0.1) failed
2015-02-26T16:43:15.852+1100 warning: Failed to connect to 127.0.0.1:11011, reason: errno:111 Connection refused
2015-02-26T16:43:15.852+1100 reconnect 127.0.0.1:11011 (127.0.0.1) failed failed couldn't connect to server 127.0.0.1:11011 (127.0.0.1), connection attempt failed
2015-02-26T16:43:15.855+1100 trying reconnect to 127.0.0.1:11011 (127.0.0.1) failed
2015-02-26T16:43:15.855+1100 warning: Failed to connect to 127.0.0.1:11011, reason: errno:111 Connection refused
2015-02-26T16:43:15.855+1100 reconnect 127.0.0.1:11011 (127.0.0.1) failed failed couldn't connect to server 127.0.0.1:11011 (127.0.0.1), connection attempt failed
2015-02-26T16:43:15.859+1100 trying reconnect to 127.0.0.1:11011 (127.0.0.1) failed
2015-02-26T16:43:15.859+1100 warning: Failed to connect to 127.0.0.1:11011, reason: errno:111 Connection refused
2015-02-26T16:43:15.859+1100 reconnect 127.0.0.1:11011 (127.0.0.1) failed failed couldn't connect to server 127.0.0.1:11011 (127.0.0.1), connection attempt failed
>
bye

Participants:

 Description   
Issue Status as of Mar 10, 2015

ISSUE SUMMARY
MongoDB checks when files are closed or deleted to ensure that no journal entries reference these files so that the journal is always consistent with the data files. In addition, MongoDB writes data file creations to the journal, so they can be replayed if the server crashes. When data files fail to create because of an OS level error (e.g. volume out of space), MongoDB may incorrectly handle these file creation journal entries and treat them as referencing missing files.

USER IMPACT
When running with the MMAPv1 storage engine and journaling, mongod aborts if cannot create a new data file.

WORKAROUNDS
Ensure that the volume where mongod's dbpath exists, always has sufficient space to accommodate data size growth.

AFFECTED VERSIONS: 3.0.0

FIX VERSION
The fix is included in the 3.0.1 production release.

RESOLUTION DETAILS
Change behavior when closing files to ensure that references in the journal entries are handled properly.

Original description

2015-02-26T16:43:15.756+1100 F JOURNAL  [conn1] journal warning files are closing outside locks with writes pending
2015-02-26T16:43:15.756+1100 I -        [conn1] Invariant failure false src/mongo/db/storage/mmap_v1/dur.cpp 738
2015-02-26T16:43:15.772+1100 I CONTROL  [conn1] 
 0xf25749 0xecf571 0xeb4612 0xcebafc 0xcfe9c6 0xd19a72 0xd185cc 0xd18740 0xd189ba 0xd1cfae 0xd291bc 0xd1d7a9 0xd1d959 0x8f073e 0x8f0f0c 0x98dd24 0x98f11a 0x9908b4 0x990fb5 0x99306d 0x9b31c4 0x9b4103 0x9b4bbb 0xb80ea5 0xa941fa 0x7e5320 0xee37d1 0x7f2618776182 0x7f2617876efd
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B25749"},{"b":"400000","o":"ACF571"},{"b":"400000","o":"AB4612"},{"b":"400000","o":"8EBAFC"},{"b":"400000","o":"8FE9C6"},{"b":"400000","o":"919A72"},{"b":"400000","o":"9185CC"},{"b":"400000","o":"918740"},{"b":"400000","o":"9189BA"},{"b":"400000","o":"91CFAE"},{"b":"400000","o":"9291BC"},{"b":"400000","o":"91D7A9"},{"b":"400000","o":"91D959"},{"b":"400000","o":"4F073E"},{"b":"400000","o":"4F0F0C"},{"b":"400000","o":"58DD24"},{"b":"400000","o":"58F11A"},{"b":"400000","o":"5908B4"},{"b":"400000","o":"590FB5"},{"b":"400000","o":"59306D"},{"b":"400000","o":"5B31C4"},{"b":"400000","o":"5B4103"},{"b":"400000","o":"5B4BBB"},{"b":"400000","o":"780EA5"},{"b":"400000","o":"6941FA"},{"b":"400000","o":"3E5320"},{"b":"400000","o":"AE37D1"},{"b":"7F261876E000","o":"8182"},{"b":"7F261777C000","o":"FAEFD"}],"processInfo":{ "mongodbVersion" : "2.8.0-rc5", "gitVersion" : "74b351de21c84438b12a83b28e155f5e69e3c1eb", "uname" : { "sysname" : "Linux", "release" : "3.13.0-39-generic", "version" : "#66-Ubuntu SMP Tue Oct 28 13:30:27 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFD9483000", "elfType" : 3 }, { "b" : "7F261876E000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F2618566000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F2618362000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F261805E000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F2617D58000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F2617B42000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F261777C000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F261898C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf25749]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xecf571]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xB2) [0xeb4612]
 mongod(_ZN5mongo3dur23closingFileNotificationEv+0x5C) [0xcebafc]
 mongod(_ZN5mongo17DurableMappedFileD1Ev+0xF6) [0xcfe9c6]
 mongod(_ZNSt8auto_ptrIN5mongo8DataFileEED1Ev+0x12) [0xd19a72]
 mongod(_ZN5mongo19MmapV1ExtentManager9_addAFileEPNS_16OperationContextEib+0x3CC) [0xd185cc]
 mongod(_ZN5mongo19MmapV1ExtentManager13_createExtentEPNS_16OperationContextEib+0xE0) [0xd18740]
 mongod(_ZN5mongo19MmapV1ExtentManager14allocateExtentEPNS_16OperationContextEbib+0x1DA) [0xd189ba]
 mongod(_ZN5mongo17RecordStoreV1Base19increaseStorageSizeEPNS_16OperationContextEib+0x3E) [0xd1cfae]
 mongod(_ZN5mongo19SimpleRecordStoreV111allocRecordEPNS_16OperationContextEib+0x1DC) [0xd291bc]
 mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x59) [0xd1d7a9]
 mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0x79) [0xd1d959]
 mongod(_ZN5mongo10Collection15_insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x4E) [0x8f073e]
 mongod(_ZN5mongo10Collection14insertDocumentEPNS_16OperationContextERKNS_7BSONObjEb+0x8C) [0x8f0f0c]
 mongod(_ZN5mongo18WriteBatchExecutor13execOneInsertEPNS0_16ExecInsertsStateEPPNS_16WriteErrorDetailE+0x214) [0x98dd24]
 mongod(_ZN5mongo18WriteBatchExecutor11execInsertsERKNS_21BatchedCommandRequestEPSt6vectorIPNS_16WriteErrorDetailESaIS6_EE+0x25A) [0x98f11a]
 mongod(_ZN5mongo18WriteBatchExecutor11bulkExecuteERKNS_21BatchedCommandRequestEPSt6vectorIPNS_19BatchedUpsertDetailESaIS6_EEPS4_IPNS_16WriteErrorDetailESaISB_EE+0x34) [0x9908b4]
 mongod(_ZN5mongo18WriteBatchExecutor12executeBatchERKNS_21BatchedCommandRequestEPNS_22BatchedCommandResponseE+0x395) [0x990fb5]
 mongod(_ZN5mongo8WriteCmd3runEPNS_16OperationContextERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x15D) [0x99306d]
 mongod(_ZN5mongo12_execCommandEPNS_16OperationContextEPNS_7CommandERKSsRNS_7BSONObjEiRSsRNS_14BSONObjBuilderEb+0x34) [0x9b31c4]
 mongod(_ZN5mongo7Command11execCommandEPNS_16OperationContextEPS0_iPKcRNS_7BSONObjERNS_14BSONObjBuilderEb+0xC13) [0x9b4103]
 mongod(_ZN5mongo12_runCommandsEPNS_16OperationContextEPKcRNS_7BSONObjERNS_11_BufBuilderINS_16TrivialAllocatorEEERNS_14BSONObjBuilderEbi+0x28B) [0x9b4bbb]
 mongod(_ZN5mongo8runQueryEPNS_16OperationContextERNS_7MessageERNS_12QueryMessageERKNS_15NamespaceStringERNS_5CurOpES3_b+0x755) [0xb80ea5]
 mongod(_ZN5mongo16assembleResponseEPNS_16OperationContextERNS_7MessageERNS_10DbResponseERKNS_11HostAndPortEb+0xB0A) [0xa941fa]
 mongod(_ZN5mongo16MyMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0xE0) [0x7e5320]
 mongod(_ZN5mongo17PortMessageServer17handleIncomingMsgEPv+0x321) [0xee37d1]
 libpthread.so.0(+0x8182) [0x7f2618776182]
 libc.so.6(clone+0x6D) [0x7f2617876efd]
-----  END BACKTRACE  -----
2015-02-26T16:43:15.772+1100 I -        [conn1] 
 
***aborting after invariant() failure

Before starting, the volume had about 7.6GB free:

Filesystem               1K-blocks     Used Available Use% Mounted on
/dev/cryptvg/root         72117576 60693448   7926492  89% /

and after the crash, about 764MB free:

Filesystem               1K-blocks     Used Available Use% Mounted on
/dev/cryptvg/root         72117576 67838328    781612  99% /

Inserted 15MB docs until out of space, then started inserting 1MB docs.

mongod started normally with no special options. Repros consistently with logLevel 0, but when I tried -vv, it didn't fail. I then tried -v, and the invariant occurred again.



 Comments   
Comment by Githook User [ 04/Mar/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-17388/SERVER-16776 If closing unopened file, do not check for journal writes

(cherry picked from commit a9f240048fe27d925c0ad1798c1bf29ce59f0acc)
Branch: v3.0
https://github.com/mongodb/mongo/commit/e1adb14bb593e22203d82018f4ba78c77aa289b3

Comment by Githook User [ 27/Feb/15 ]

Author:

{u'username': u'kaloianm', u'name': u'Kaloian Manassiev', u'email': u'kaloian.manassiev@mongodb.com'}

Message: SERVER-17388/SERVER-16776 If closing unopened file, do not check for journal writes
Branch: master
https://github.com/mongodb/mongo/commit/a9f240048fe27d925c0ad1798c1bf29ce59f0acc

Generated at Thu Feb 08 03:44:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.