[SERVER-21922] MMAPv1: Invariant failure after mongod restart Created: 16/Dec/15  Updated: 23/Dec/15  Resolved: 23/Dec/15

Status: Closed
Project: Core Server
Component/s: MMAPv1, Storage
Affects Version/s: 3.2.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Šimun Mikecin Assignee: Kelsey Schubert
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongod.log    
Operating System: ALL
Steps To Reproduce:

You need MMAPv1 database storage with some data created by MongoDB 3.0 on CentOS 6 64-bit.
Upgrade it to MongoDB 3.2 and do some writes.
Try to restart.

Participants:

 Description   

I have a MMAPv1 database storage that was created with MongoDB 3.0 on CentOS 6 64-bit and later upgraded to MongoDB 3.2.
I tried to restart mongod service (MongoDB 3.2) and got this error:

2015-12-16T13:45:37.368+0100 I CONTROL  [main] ***** SERVER RESTARTED *****
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] MongoDB starting : pid=15850 port=27017 dbpath=/var/lib/mongo 64-bit host=XXX
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] db version v3.2.0
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] git version: 45d947729a0315accb6d4f15a6b06be6d9c19fe7
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] allocator: tcmalloc
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] modules: none
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] build environment:
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten]     distmod: rhel62
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten]     distarch: x86_64
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten]     target_arch: x86_64
2015-12-16T13:45:37.397+0100 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", processManagement: { fork: true, pidFilePath: "/var/run/mongodb/mongod.pid" }, replication: { replSetName: "rslive0" }, security: { keyFile: "/var/lib/mongo/key.txt" }, storage: { dbPath: "/var/lib/mongo" }, systemLog: { destination: "file", logAppend: true, path: "/var/log/mongodb/mongod.log" } }
2015-12-16T13:45:37.422+0100 I -        [initandlisten] Detected data files in /var/lib/mongo created by the 'mmapv1' storage engine, so setting the active storage engine to 'mmapv1'.
2015-12-16T13:45:37.422+0100 W -        [initandlisten] Detected unclean shutdown - /var/lib/mongo/mongod.lock is not empty.
2015-12-16T13:45:37.434+0100 I JOURNAL  [initandlisten] journal dir=/var/lib/mongo/journal
2015-12-16T13:45:37.434+0100 I JOURNAL  [initandlisten] recover begin
2015-12-16T13:45:37.434+0100 I JOURNAL  [initandlisten] info no lsn file in journal/ directory
2015-12-16T13:45:37.434+0100 I JOURNAL  [initandlisten] recover lsn: 0
2015-12-16T13:45:37.434+0100 I JOURNAL  [initandlisten] recover /var/lib/mongo/journal/j._0
2015-12-16T13:45:37.442+0100 I JOURNAL  [initandlisten] recover cleaning up
2015-12-16T13:45:37.442+0100 I JOURNAL  [initandlisten] removeJournalFiles
2015-12-16T13:45:37.443+0100 I JOURNAL  [initandlisten] recover done
2015-12-16T13:45:37.556+0100 I JOURNAL  [durability] Durability thread started
2015-12-16T13:45:37.557+0100 I JOURNAL  [journal writer] Journal writer thread started
2015-12-16T13:45:37.583+0100 I -        [initandlisten] Invariant failure details src/mongo/db/storage/mmap_v1/mmap_v1_database_catalog_entry.cpp 667
2015-12-16T13:45:37.583+0100 I -        [initandlisten]
 
***aborting after invariant() failure
 
 
2015-12-16T13:45:37.591+0100 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0x1304652 0x1303599 0x1303da2 0x309ea0f710 0x309e232925 0x309e234105 0x128c3fb 0x102e809 0x102f389 0x102f786 0x1032b7e 0xac6cc3 0x973e5c 0x976e59 0x309e21ed1d 0x970e69
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F04652"},{"b":"400000","o":"F03599"},{"b":"400000","o":"F03DA2"},{"b":"309EA00000","o":"F710"},{"b":"309E200000","o":"32925"},{"b":"309E200000","o":"34105"},{"b":"400000","o":"E8C3FB"},{"b":"400000","o":"C2E809"},{"b":"400000","o":"C2F389"},{"b":"400000","o":"C2F786"},{"b":"400000","o":"C32B7E"},{"b":"400000","o":"6C6CC3"},{"b":"400000","o":"573E5C"},{"b":"400000","o":"576E59"},{"b":"309E200000","o":"1ED1D"},{"b":"400000","o":"570E69"}],"processInfo":{ "mongodbVersion" : "3.2.0", "gitVersion" : "45d947729a0315accb6d4f15a6b06be6d9c19fe7", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "2.6.32-431.el6.x86_64", "version" : "#1 SMP Fri Nov 22 03:15:09 UTC 2013", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "A79D2083C8EFE1AC7983704B6007FA7C521E8DD9" }, { "b" : "7FFFC2A87000", "elfType" : 3, "buildId" : "81A81BE2E44C93640ADEDB62ADC93A47F4A09DD1" }, { "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "318EAB33420B000D542F09B91B716BACAB1AD546" }, { "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "3A8D65B9A373C0AFAF106F3A979835B16DBEFF1A" }, { "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "B26528BF6C0636AC1CAE5AC50BDBC07E60851DF4" }, { "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "AFC7448F2F2F6ED4E5BC82B1BD8A7320B84A9D48" }, { "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "ED99110E629209C5CA6C0ED704F2C5CE3171513A" }, { "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "98B028A725D6E93253F25DF00B794DFAA66A3145" }, { "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "A44499D29B114A5366CD72DD4883958495AC1C1D" }, { "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "1BB4E10307D6B94223749CFDF2AD14C365972C60" }, { "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "F1A1C0575F0EC141A157E5DFA4525E70BD27B62E" }, { "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "57BF668F99B7F5917B8D55FBB645173C9A644575" }, { "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "EFF68B7DE77D081BC4A0CB38FE9DCBC60541BF92" }, { "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "95EBB74C2C0A1E1714344036145A0239FFA4892D" }, { "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6A22EDFF4D4F04A57573E3D1536B6B4963159CD5" }, { "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "D02E7D3149950118009A81997434E28B7D9EC9B2" }, { "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "5AFCBEA0D62EE0335714CCBAB7BA808E2A16028C" }, { "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8A8734DC37305D8CC2EF8F8C3E5EA03171DB07EC" }, { "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "56843351EFB2CE304A7E4BD0754991613E9EC8BD" }, { "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "BAD5C71361DADF259B6E306A49E6F47F24AEA3DC" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1304652]
 mongod(+0xF03599) [0x1303599]
 mongod(+0xF03DA2) [0x1303da2]
 libpthread.so.0(+0xF710) [0x309ea0f710]
 libc.so.6(gsignal+0x35) [0x309e232925]
 libc.so.6(abort+0x175) [0x309e234105]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xCB) [0x128c3fb]
 mongod(_ZN5mongo26MMAPV1DatabaseCatalogEntry14_insertInCacheEPNS_16OperationContextENS_10StringDataENS_8RecordIdEPNS0_5EntryE+0x399) [0x102e809]
 mongod(_ZN5mongo26MMAPV1DatabaseCatalogEntry5_initEPNS_16OperationContextE+0xAD9) [0x102f389]
 mongod(_ZN5mongo26MMAPV1DatabaseCatalogEntryC1EPNS_16OperationContextENS_10StringDataES3_bb+0x1A6) [0x102f786]
 mongod(_ZN5mongo12MMAPV1Engine23getDatabaseCatalogEntryEPNS_16OperationContextENS_10StringDataE+0x14E) [0x1032b7e]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0x133) [0xac6cc3]
 mongod(_ZN5mongo13initAndListenEi+0xE3C) [0x973e5c]
 mongod(main+0x149) [0x976e59]
 libc.so.6(__libc_start_main+0xFD) [0x309e21ed1d]
 mongod(+0x570E69) [0x970e69]
-----  END BACKTRACE  -----



 Comments   
Comment by Kelsey Schubert [ 23/Dec/15 ]

Hi sime,

Thank you for the information. Unfortunately, without a reproduction or the affected data files, we are unable to identify the cause of this issue. If you experience this invariant failure again please keep the data files, and we will reopen this ticket.

You may be interested in following a similar issue, SERVER-21550, that we are investigating.

Regards,
Thomas

Comment by Šimun Mikecin [ 17/Dec/15 ]

As far as I can see there were no I/O or block device errors reported in the logs.

Comment by Šimun Mikecin [ 16/Dec/15 ]

1. There was no problem with MongoDB. I restarted because I was debugging reconnection feature of another application that connects to MongoDB database. There were no system restarts and there were no operating system or MongoDB configuration changes. All what was done is that there were many people/applications using (read, write, mapReduce, creating short lived collections, etc.) that database at the time I restarted it (it is used as a development database).

2. Database storage is a local XFS filesystem (with access time updates disabled) dedicated just for a MongoDB on a CentOS 6 64-bit Virtual Machine. Journal is enabled and is not on a separate partition. There were no operations (copy, move, rename, backup or anything alike) on the database files before this issue.

3. I will check this tomorrow and report as soon as I have any info.

I don't know if this info is of any help, but few days ago partial and text indexes were added.

Comment by Kelsey Schubert [ 16/Dec/15 ]

Hi sime,

Thank you for the report. To help us better understand your system and your issue, could you please provide answers to the following questions about your deployment and environment:

  1. Was there a problem that prompted you to first restarted the server? Did your system restart? Were there any configuration changes to your system?
  2. Can you provide more details about your storage layer?
    1. What filesystem are you using?
    2. Is it local or remote?
    3. Are your journal file on the same block device as your data files?
    4. Before getting this issue the first time did you manipulate your data files outside of mongoldb (copies, moves, renames, backup operations?) Are you performing any sort of maintenance on your filesystem?
  3. Can you search your system logs on the affected node for any I/O errors or block device errors?

Thanks for your help!

Regards,
Thomas

Comment by Šimun Mikecin [ 16/Dec/15 ]

Log above shows unclean shutdown because I pasted the log that was after several attempts to start mongod service. Every attempt kept mongod.lock file.
I have attached full log that shows initial service stop request with a start request afterwards (without unclean shutdown warning). This was done with "service mongod restart" from the shell.
After that there were many failed start requests (with unclean shutdown warning).
I solved it by deleting all database files and restoring from another Replica Set node so I don't have problematic data any more.

Maybe these lines have something to do with this problem:
2015-12-16T13:44:02.019+0100 I STORAGE [conn121] got request after shutdown()
2015-12-16T13:44:02.106+0100 I STORAGE [conn32511] got request after shutdown()

Comment by Ramon Fernandez Marina [ 16/Dec/15 ]

The log snippet above seems to indicate there was an unclean shutdown before this restart, which may have caused data inconsistencies that now prevent mongod from starting. Can you provide further details on how was mongod shut down before this restart? Can you provide full logs for this node? Also, have you tried running {{validate(full} on your data?

Generated at Thu Feb 08 03:58:49 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.