[SERVER-32153] Mongo wont start after clean shutdown Created: 03/Dec/17  Updated: 27/Oct/23  Resolved: 04/Dec/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: volkan Assignee: Mark Agarunov
Resolution: Gone away Votes: 0
Labels: envh, rpo, rpu, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     File _mdb_catalog.wt     File repair-SERVER-32153.tar.gz    
Operating System: Linux
Steps To Reproduce:

# mongod --dbpath /var/lib/mongo --repair
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] MongoDB starting : pid=10742 port=27017 dbpath=/var/lib/mongo 64-bit host=tr-5.ciftlikbank.com
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] db version v3.4.10
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] git version: 078f28920cb24de0dd479b5ea6c66c644f6326e9
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] allocator: tcmalloc
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] modules: none
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] build environment:
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten]     distmod: rhel62
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten]     distarch: x86_64
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten]     target_arch: x86_64
2017-12-03T14:21:44.294+0300 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: "/var/lib/mongo" } }
2017-12-03T14:21:44.332+0300 I -        [initandlisten] Detected data files in /var/lib/mongo created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-12-03T14:21:44.332+0300 I STORAGE  [initandlisten] 
2017-12-03T14:21:44.332+0300 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2017-12-03T14:21:44.332+0300 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2017-12-03T14:21:44.332+0300 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=257465M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),,log=(enabled=false),
2017-12-03T14:21:44.675+0300 I STORAGE  [initandlisten] Repairing size cache
2017-12-03T14:21:44.676+0300 I STORAGE  [initandlisten] Verify succeeded on uri table:sizeStorer. Not salvaging.
2017-12-03T14:21:44.677+0300 I STORAGE  [initandlisten] Repairing catalog metadata
2017-12-03T14:21:44.677+0300 E STORAGE  [initandlisten] WiredTiger error (-31802) [1512300104:677577][10742:0x7fbe40a88dc0], file:_mdb_catalog.wt, WT_SESSION.verify: _mdb_catalog.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2017-12-03T14:21:44.677+0300 I STORAGE  [initandlisten] Verify failed on uri table:_mdb_catalog. Running a salvage operation.
2017-12-03T14:21:44.677+0300 E STORAGE  [initandlisten] WiredTiger error (-31802) [1512300104:677920][10742:0x7fbe40a88dc0], file:_mdb_catalog.wt, WT_SESSION.salvage: _mdb_catalog.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2017-12-03T14:21:44.679+0300 E STORAGE  [initandlisten] WiredTiger error (-31802) [1512300104:679066][10742:0x7fbe40a88dc0], file:_mdb_catalog.wt, WT_SESSION.open_cursor: _mdb_catalog.wt does not appear to be a WiredTiger file: WT_ERROR: non-specific WiredTiger error
2017-12-03T14:21:44.679+0300 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 95
2017-12-03T14:21:44.679+0300 I -        [initandlisten] 
 
***aborting after invariant() failure
 
 
2017-12-03T14:21:44.697+0300 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0x559409d11621 0x559409d10839 0x559409d10d1d 0x7fbe3f7727e0 0x7fbe3f401495 0x7fbe3f402c75 0x559408fb5083 0x559409a1b1f6 0x559409a18fcc 0x559409a15604 0x559409a13e2c 0x559409a0157b 0x559409946fe1 0x5594099ff24e 0x5594098f16c7 0x559408fa12dc 0x559408fc0cfb 0x7fbe3f3edd1d 0x55940901fed1
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"559408799000","o":"1578621","s":"_ZN5mongo15printStackTraceERSo"},{"b":"559408799000","o":"1577839"},{"b":"559408799000","o":"1577D1D"},{"b":"7FBE3F763000","o":"F7E0"},{"b":"7FBE3F3CF000","o":"32495","s":"gsignal"},{"b":"7FBE3F3CF000","o":"33C75","s":"abort"},{"b":"559408799000","o":"81C083","s":"_ZN5mongo25fassertFailedWithLocationEiPKcj"},{"b":"559408799000","o":"12821F6","s":"_ZN5mongo17WiredTigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb"},{"b":"559408799000","o":"127FFCC","s":"_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE"},{"b":"559408799000","o":"127C604","s":"_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b"},{"b":"559408799000","o":"127AE2C","s":"_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE"},{"b":"559408799000","o":"126857B","s":"_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE"},{"b":"559408799000","o":"11ADFE1","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE"},{"b":"559408799000","o":"126624E"},{"b":"559408799000","o":"11586C7","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"559408799000","o":"8082DC"},{"b":"559408799000","o":"827CFB","s":"main"},{"b":"7FBE3F3CF000","o":"1ED1D","s":"__libc_start_main"},{"b":"559408799000","o":"886ED1"}],"processInfo":{ "mongodbVersion" : "3.4.10", "gitVersion" : "078f28920cb24de0dd479b5ea6c66c644f6326e9", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.9.58-xxxx-std-ipv6-64", "version" : "#1 SMP Mon Oct 23 11:35:59 CEST 2017", "machine" : "x86_64" }, "somap" : [ { "b" : "559408799000", "elfType" : 3, "buildId" : "A8D942491CCD5A3D1198EEFB3186869077045D14" }, { "b" : "7FFC69F65000", "elfType" : 3, "buildId" : "90F5D5C2EC7FEB425058514A6D94816B97AC7006" }, { "b" : "7FBE4060B000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "BECFB85A8BC084042D5BF2BA9E66325CE798B659" }, { "b" : "7FBE40226000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "CBDA444A7109874C5350AE9CEEF3F82F749B347F" }, { "b" : "7FBE4001E000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "FDF3A36FFFE08375456D59DA959EAB2FC30B6186" }, { "b" : "7FBE3FE1A000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "1F7E85410384392BC51FA7324961719A10125F31" }, { "b" : "7FBE3FB96000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8A852AC42F0B64F0F30C760EBBCFA3FE4A228F12" }, { "b" : "7FBE3F980000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "BC7550A8A7C2D706FE4E489058BADC963465DBB7" }, { "b" : "7FBE3F763000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "85104ECFE42C606B31C2D0D0D2E5DACD3286A341" }, { "b" : "7FBE3F3CF000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "814F2290D172521A3FD8581389E3E78A4A182379" }, { "b" : "7FBE40877000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1CC2165E019D43F71FDE0A47AF9F4C8EB5E51963" }, { "b" : "7FBE3F18B000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "9A737F8BF10FC99C37CC404D3FC188F6E11FEDD9" }, { "b" : "7FBE3EEA4000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "8D3D6E28DF6EB3752642A7031AAC17D39EA4265D" }, { "b" : "7FBE3ECA0000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "57F77704A7F1F4E3689D028D3F9ADD4E77486EC9" }, { "b" : "7FBE3EA74000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "CC89B4C8CDCCD32BA610BC72784DC3B7E9BD9E19" }, { "b" : "7FBE3E85E000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "5FA8E5038EC04A774AF72A9BB62DC86E1049C4D6" }, { "b" : "7FBE3E653000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E0C522C589F775C324330BE09CE67DC83950A213" }, { "b" : "7FBE3E450000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "b" : "7FBE3E236000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "F0BE1166EDCFFB2422B940D601A1BBD89352D80F" }, { "b" : "7FBE3E017000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "B4576BE308DDCF7BC31F7304E4734C3D846D0236" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x559409d11621]
 mongod(+0x1577839) [0x559409d10839]
 mongod(+0x1577D1D) [0x559409d10d1d]
 libpthread.so.0(+0xF7E0) [0x7fbe3f7727e0]
 libc.so.6(gsignal+0x35) [0x7fbe3f401495]
 libc.so.6(abort+0x175) [0x7fbe3f402c75]
 mongod(_ZN5mongo25fassertFailedWithLocationEiPKcj+0x0) [0x559408fb5083]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmb+0x106) [0x559409a1b1f6]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEmbPNS_16OperationContextE+0x4C) [0x559409a18fcc]
 mongod(_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b+0x64) [0x559409a15604]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_NSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE+0x47C) [0x559409a13e2c]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0x23B) [0x559409a0157b]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x681) [0x559409946fe1]
 mongod(+0x126624E) [0x5594099ff24e]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x697) [0x5594098f16c7]
 mongod(+0x8082DC) [0x559408fa12dc]
 mongod(main+0x96B) [0x559408fc0cfb]
 libc.so.6(__libc_start_main+0xFD) [0x7fbe3f3edd1d]
 mongod(+0x886ED1) [0x55940901fed1]
-----  END BACKTRACE  -----
}}

Participants:

 Description   

Hi,

Today i've clean shutdown my mongo instace today for maintenance and service wont came back

could you please to help recovery

Best regards,

{{mongod --version
db version v3.4.10
git version: 078f28920cb24de0dd479b5ea6c66c644f6326e9
OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
allocator: tcmalloc
modules: none
build environment:
distmod: rhel62
distarch: x86_64
target_arch: x86_64}}



 Comments   
Comment by Mark Agarunov [ 04/Dec/17 ]

Hello memis,

Thank you for the information. I'm glad you found a solution to this issue, as the if the repair didn't work, it would indicate additional corruption on the disk. As this looks to be solved for you, I've closed this issue. If any additional information comes to light, please let me know.

Thanks,
Mark

Comment by volkan [ 04/Dec/17 ]

Hello Mark,

Thank you for repair files and your effort but I'm afraid repair files not helped.. (same error as above)

Since we extracted data yesterday night with wt tool with snappy we don't need to recover our db (luckly have exac same tables and collections with another mongo instance)

For information

1. We are using softraid with MD 2x4tb sata drives
2. Checked with MD tool says healty.. also checked sata disk with smartctl shows no problem
3. Yes database working with mongo 3.4.10 since started
4. i've copied dbfolder after server rebooted mongo was stopped (silly me why didnt mongodump before restart)
5. see number 4. i dont have working backup.. or too old for working
6. just copied from dbpath
7. 25days ago

Regards

Volkan

Comment by Mark Agarunov [ 04/Dec/17 ]

Hello memis,

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark

Comment by volkan [ 03/Dec/17 ]

Seems like my _mdb_catalog.wt file is corrupted due bad sector or file system corruption

any help is would be really appreciated

best regards,

Generated at Thu Feb 08 04:29:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.