[SERVER-42245] file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic Created: 16/Jul/19  Updated: 24/Jul/19  Resolved: 22/Jul/19

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Chengcheng Ma Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     Text File mongo_repair.log     File mongod.log.2019-07-11T10-22-54     File repair_attempt.tar.gz    
Operating System: ALL
Steps To Reproduce:

WiredTiger.turtle

Participants:

 Description   

The problem was caused by power failure , and all the 3 nodes in the replica set crashed simultaneously. The mongodb version we are using is 3.4.2, running on CentOS release 6.5 (Final).

 

I have found a few issues of not being able to repair mongo data files, which were caused by corrupted  WiredTiger.wt OR WiredTiger.turtle,  and the reporters all uploaded the two files , then you will try to repair them and re-upload .

 

What I am encountering is the exact the same  issue. So, here are the original WT files and also the repair output. Would you please help to fix this issue for me?

 

The below are the repair output:

2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] MongoDB starting : pid=2255 port=30000 dbpath=/iflytek/data/mongodb/new_data/data27017_bak/ 64-bit host=i-A3566A06
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] db version v3.4.2
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] git version: 3f76e40c105fc223b3e5aac3e20dcd026b83b38b
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] allocator: tcmalloc
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] modules: none
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] build environment:
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] distmod: rhel62
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] distarch: x86_64
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] target_arch: x86_64
2019-07-16T16:41:45.178+0800 I CONTROL [initandlisten] options: { net:

{ port: 30000 }

, repair: true, storage: { dbPath: "/iflytek/data/mongodb/new_data/data27017_bak/" } }
2019-07-16T16:41:45.226+0800 I - [initandlisten] Detected data files in /iflytek/data/mongodb/new_data/data27017_bak/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-07-16T16:41:45.226+0800 I STORAGE [initandlisten]
2019-07-16T16:41:45.226+0800 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-07-16T16:41:45.226+0800 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2019-07-16T16:41:45.226+0800 I STORAGE [initandlisten] Detected WT journal files. Running recovery from last checkpoint.
2019-07-16T16:41:45.226+0800 I STORAGE [initandlisten] journal to nojournal transition config: create,cache_size=7463M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2019-07-16T16:41:45.392+0800 E STORAGE [initandlisten] WiredTiger error (0) [1563266505:392024][2255:0x7fdc4d817d40], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 36864: block header checksum of 3570790737 doesn't match expected checksum of 1564876380
2019-07-16T16:41:45.392+0800 E STORAGE [initandlisten] WiredTiger error (0) [1563266505:392085][2255:0x7fdc4d817d40], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2019-07-16T16:41:45.392+0800 E STORAGE [initandlisten] WiredTiger error (-31804) [1563266505:392097][2255:0x7fdc4d817d40], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-07-16T16:41:45.392+0800 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2019-07-16T16:41:45.392+0800 I - [initandlisten]

***aborting after fassert() failure

2019-07-16T16:41:45.413+0800 F - [initandlisten] Got signal: 6 (Aborted).

0x7fdc4edb8d21 0x7fdc4edb7e19 0x7fdc4edb82fd 0x7fdc4c508710 0x7fdc4c197925 0x7fdc4c199105 0x7fdc4e040c9d 0x7fdc4eac73e6 0x7fdc4e04af80 0x7fdc4e04b074 0x7fdc4e04b2cc 0x7fdc4f6bb1df 0x7fdc4f6bb72b 0x7fdc4f6b7e3d 0x7fdc4f6bc907 0x7fdc4f6da056 0x7fdc4f7106db 0x7fdc4f79d4fd 0x7fdc4f79dbf8 0x7fdc4f79e10c 0x7fdc4f7204c1 0x7fdc4f792f30 0x7fdc4f75d38e 0x7fdc4f75d47b 0x7fdc4f70cb2f 0x7fdc4eaaacec 0x7fdc4eaa3865 0x7fdc4e991fa7 0x7fdc4e02c559 0x7fdc4e04c6c4 0x7fdc4c183d1d 0x7fdc4e0aa371
----- BEGIN BACKTRACE -----

{"backtrace":[\{"b":"7FDC4D82F000","o":"1589D21","s":"_ZN5mongo15printStackTraceERSo"}

,{"b":"7FDC4D82F000","o":"1588E19"},{"b":"7FDC4D82F000","o":"15892FD"},{"b":"7FDC4C4F9000","o":"F710"},{"b":"7FDC4C165000","o":"32925","s":"gsignal"},{"b":"7FDC4C165000","o":"34105","s":"abort"},{"b":"7FDC4D82F000","o":"811C9D","s":"ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"7FDC4D82F000","o":"12983E6"},{"b":"7FDC4D82F000","o":"81BF80"},{"b":"7FDC4D82F000","o":"81C074","s":"wt_err"},{"b":"7FDC4D82F000","o":"81C2CC","s":"wt_panic"},{"b":"7FDC4D82F000","o":"1E8C1DF"},{"b":"7FDC4D82F000","o":"1E8C72B"},{"b":"7FDC4D82F000","o":"1E88E3D"},{"b":"7FDC4D82F000","o":"1E8D907"},{"b":"7FDC4D82F000","o":"1EAB056"},{"b":"7FDC4D82F000","o":"1EE16DB"},{"b":"7FDC4D82F000","o":"1F6E4FD"},{"b":"7FDC4D82F000","o":"1F6EBF8"},{"b":"7FDC4D82F000","o":"1F6F10C"},{"b":"7FDC4D82F000","o":"1EF14C1"},{"b":"7FDC4D82F000","o":"1F63F30"},{"b":"7FDC4D82F000","o":"1F2E38E"},{"b":"7FDC4D82F000","o":"1F2E47B"},{"b":"7FDC4D82F000","o":"1EDDB2F","s":"wiredtiger_open"},{"b":"7FDC4D82F000","o":"127BCEC","s":"_ZN5mongo18WiredTigerKVEngineC2ERKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"7FDC4D82F000","o":"1274865"},{"b":"7FDC4D82F000","o":"1162FA7","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"7FDC4D82F000","o":"7FD559"},{"b":"7FDC4D82F000","o":"81D6C4","s":"main"},{"b":"7FDC4C165000","o":"1ED1D","s":"_libc_start_main"},{"b":"7FDC4D82F000","o":"87B371"}],"processInfo":{ "mongodbVersion" : "3.4.2", "gitVersion" : "3f76e40c105fc223b3e5aac3e20dcd026b83b38b", "compiledModules" : [], "uname" :

{ "sysname" : "Linux", "release" : "2.6.32-431.el6.x86_64", "version" : "#1 SMP Fri Nov 22 03:15:09 UTC 2013", "machine" : "x86_64" }

, "somap" : [ { "b" : "7FDC4D82F000", "elfType" : 3, "buildId" : "0409C529A50A34D3E255B4350462A560B78F8892" }, { "b" : "7FFF2F9F0000", "elfType" : 3, "buildId" : "81A81BE2E44C93640ADEDB62ADC93A47F4A09DD1" }, { "b" : "7FDC4D3A1000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "BECFB85A8BC084042D5BF2BA9E66325CE798B659" }, { "b" : "7FDC4CFBC000", "path" : "/usr/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "CBDA444A7109874C5350AE9CEEF3F82F749B347F" }, { "b" : "7FAAB95B4000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "B26528BF6C0636AC1CAE5AC50BDBC07E60851DF4" }, { "b" : "7FAAB9FB0000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "AFC7448F2F2F6ED4E5BC82B1BD8A7320B84A9D48" }, { "b" : "7FAAB892C000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "98B028A725D6E93253F25DF00B794DFAA66A3145" }, { "b" : "7FA47BF16000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "EDC925E58FE28DCA536993EB13179C739F1E6566" }, { "b" : "7FAAB90F9000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "1BB4E10307D6B94223749CFDF2AD14C365972C60" }, { "b" : "7FAAB9165000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "F1A1C0575F0EC141A157E5DFA4525E70BD27B62E" }, { "b" : "7FAABAE0D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "57BF668F99B7F5917B8D55FBB645173C9A644575" }, { "b" : "7FA478B21000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "9A737F8BF10FC99C37CC404D3FC188F6E11FEDD9" }, { "b" : "7FA47983A000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "8D3D6E28DF6EB3752642A7031AAC17D39EA4265D" }, { "b" : "7FA47A636000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "7EC54D6E88BB7D2C1284117C2A483496A01EAAF4" }, { "b" : "7FA47900A000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "CC89B4C8CDCCD32BA610BC72784DC3B7E9BD9E19" }, { "b" : "7FDC4B5F2000", "path" : "/usr/local/lib/libz.so.1", "elfType" : 3, "buildId" : "F7DFD2C44B176B74A351A07FAEA54721D114FD95" }, { "b" : "7FA479BE7000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "E0C522C589F775C324330BE09CE67DC83950A213" }, { "b" : "7FA4795E4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "AF374BAFB7F5B139A0B431D3F06D82014AFF3251" }, { "b" : "7FAAB63CA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "56843351EFB2CE304A7E4BD0754991613E9EC8BD" }, { "b" : "7FA47A9AB000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "B4576BE308DDCF7BC31F7304E4734C3D846D0236" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7fdc4edb8d21]
mongod(+0x1588E19) [0x7fdc4edb7e19]
mongod(+0x15892FD) [0x7fdc4edb82fd]
libpthread.so.0(+0xF710) [0x7fdc4c508710]
libc.so.6(gsignal+0x35) [0x7fdc4c197925]
libc.so.6(abort+0x175) [0x7fdc4c199105]
mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x7fdc4e040c9d]
mongod(+0x12983E6) [0x7fdc4eac73e6]
mongod(+0x81BF80) [0x7fdc4e04af80]
mongod(__wt_err+0x9D) [0x7fdc4e04b074]
mongod(__wt_panic+0x24) [0x7fdc4e04b2cc]
mongod(+0x1E8C1DF) [0x7fdc4f6bb1df]
mongod(+0x1E8C72B) [0x7fdc4f6bb72b]
mongod(+0x1E88E3D) [0x7fdc4f6b7e3d]
mongod(+0x1E8D907) [0x7fdc4f6bc907]
mongod(+0x1EAB056) [0x7fdc4f6da056]
mongod(+0x1EE16DB) [0x7fdc4f7106db]
mongod(+0x1F6E4FD) [0x7fdc4f79d4fd]
mongod(+0x1F6EBF8) [0x7fdc4f79dbf8]
mongod(+0x1F6F10C) [0x7fdc4f79e10c]
mongod(+0x1EF14C1) [0x7fdc4f7204c1]
mongod(+0x1F63F30) [0x7fdc4f792f30]
mongod(+0x1F2E38E) [0x7fdc4f75d38e]
mongod(+0x1F2E47B) [0x7fdc4f75d47b]
mongod(wiredtiger_open+0x171F) [0x7fdc4f70cb2f]
mongod(ZN5mongo18WiredTigerKVEngineC2ERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0xD7C) [0x7fdc4eaaacec]
mongod(+0x1274865) [0x7fdc4eaa3865]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x6B7) [0x7fdc4e991fa7]
mongod(+0x7FD559) [0x7fdc4e02c559]
mongod(main+0x964) [0x7fdc4e04c6c4]
libc.so.6(__libc_start_main+0xFD) [0x7fdc4c183d1d]
mongod(+0x87B371) [0x7fdc4e0aa371]
----- END BACKTRACE -----

 

 

PS: Since this kind of problems had occurred so many time, have you considered telling us how to correct the corrupted file by ourselves, not just fix the files problems for us? 

 



 Comments   
Comment by Chengcheng Ma [ 24/Jul/19 ]

So many thanks,  Kelsey Schubert. 

By using the files you attached, we have dumped most of the data from .wt data file directly, through the utility of wt. Now we have reloaded the dumped data into a monogd instance. Thanks again.

Comment by Kelsey Schubert [ 22/Jul/19 ]

Hi cora_ma,

This error indicates additional corruption on disk affecting other files, and I'd strongly recommend verifying the integrity of your disks. Unfortunately, we do not have any scripts that would repair these files.

Kind regards,
Kelsey

Comment by Chengcheng Ma [ 19/Jul/19 ]

@Kelsey Schubert

Would please help?

Comment by Chengcheng Ma [ 18/Jul/19 ]

Sorry to not upload the log files which mentioned in the 1st answer.

The file named mongod.log.2019-07-11T10-22-54 is the log when tried to start after crash.

mongod.log.2019-07-11T10-22-54

mongo_repair.log is the repair log which replaced with your uploaded files. 

 mongo_repair.log

Comment by Chengcheng Ma [ 17/Jul/19 ]

@Kelsey Schubert, unfortunately we were not able to recover the dead mongod. As you mentioned, here are the corresponding information which you suspected:

  1. The file attached, named orig.log, is the logs which recorded the process of crash-up, failure of start-up.
  2. The storage underlying:
    1. FS Type: ext4
    2. Locally attached;
    3. No Raid for the data path mount;
    4. Disk Type: HDD
  3. Never backed up.
  4. Disk check recently is OK.
  5. Deployment history:
    1. No version change;
    2. No hardware upgrade/downgrade, no configuration change;
    3. No activities of backup and restoration;
    4. No file movement.

 

Because all the 3 nodes in the replica set have the same situation, so we do not have any unaffected node to resync.

 

Thanks a lot in advance.

Comment by Kelsey Schubert [ 17/Jul/19 ]

cora_ma, this error message leads us to suspect some form of physical corruption.
Our ability to determine the source of this corruption depends greatly on your ability to provide:

  1. The logs for the affected node, including before, leading up to, and after the first sign of corruption.
  2. A description of the underlying storage mechanism in use, including details like:
    1. What file system and/or volume management system is in use?
    2. Is data storage locally attached or network-attached?
    3. Are disks RAIDed and if so how?
    4. Are disks SSDs or HDDs?
  3. A description of your backup method, if any.
  4. A description of your disks have been recently checked for integrity?
  5. A history of the deployment, including:
    1. a timeline of version changes
    2. a timeline of hardware upgrade/downgrade cycles or configuration changes
    3. a timeline of disaster recovery or backup restoration activities
    4. a timeline of any manipulations of the underlying database files, including copies or moves, and information about whether mongod was running during each manipulation.

The ideal resolution is to perform a clean resync from an unaffected node. If that is not possible, I've attached a repair attempt of the files you provided as repair_attempt.tar.gz. Please extract these files, replace them in your $dbpath, and let us know if it resolves the issue.

Kind regards,
Kelsey

Comment by Chengcheng Ma [ 17/Jul/19 ]

Does any one help?

Generated at Thu Feb 08 04:59:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.