[SERVER-37747] the process must exit and restart: WT_PANIC: WiredTiger library panic Created: 25/Oct/18  Updated: 06/Mar/19  Resolved: 30/Oct/18

Status: Closed
Project: Core Server
Component/s: Shell, WiredTiger
Affects Version/s: 3.0 Required
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: hanyudong Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File 20181030.tar.gz     File WiredTiger.turtle     File WiredTiger.turtle     File WiredTiger.wt     File WiredTiger.wt     File WiredTiger.wt     File mongodb.out     File repair_attempt.tar.gz    
Issue Links:
Duplicate
is duplicated by SERVER-37723 exception in initAndListen: 29 Data d... Closed
Operating System: ALL
Steps To Reproduce:

The above description is my most complete startup log.

 

 

For more details, please see the attachment.

Participants:

 Description   

2018-10-25T10:17:15.030+0800 W - [initandlisten] Detected unclean shutdown - /data/mongo_data/mongodb/mongod.lock is not empty.
2018-10-25T10:17:15.030+0800 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2018-10-25T10:17:15.030+0800 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=11G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-10-25T10:17:15.034+0800 E STORAGE [initandlisten] WiredTiger (0) [1540433835:34734][16529:0x7f1cc3334cc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 94208: block header checksum of 3534655478 doesn't match expected checksum of 351635639
2018-10-25T10:17:15.034+0800 E STORAGE [initandlisten] WiredTiger (0) [1540433835:34762][16529:0x7f1cc3334cc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2018-10-25T10:17:15.034+0800 E STORAGE [initandlisten] WiredTiger (-31804) [1540433835:34770][16529:0x7f1cc3334cc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-10-25T10:17:15.034+0800 I - [initandlisten] Fatal Assertion 28558
2018-10-25T10:17:15.043+0800 I CONTROL [initandlisten]
0xf75549 0xf12271 0xef60f1 0xd93f8a 0x13ab369 0x13ab525 0x13ab9c4 0x12fe3d2 0x131811c 0x1315ef8 0x1316d52 0x1340b4b 0x13aa50b 0x13780ab 0x133e207 0xd7e61c 0xd7c3a8 0xa7dc5d 0x7f57a2 0x7fa739 0x7f1cc186f445 0x7f3549
----- BEGIN BACKTRACE -----

{"backtrace":[\{"b":"400000","o":"B75549"}

,{"b":"400000","o":"B12271"},{"b":"400000","o":"AF60F1"},{"b":"400000","o":"993F8A"},{"b":"400000","o":"FAB369"},{"b":"400000","o":"FAB525"},{"b":"400000","o":"FAB9C4"},{"b":"400000","o":"EFE3D2"},{"b":"400000","o":"F1811C"},{"b":"400000","o":"F15EF8"},{"b":"400000","o":"F16D52"},{"b":"400000","o":"F40B4B"},{"b":"400000","o":"FAA50B"},{"b":"400000","o":"F780AB"},{"b":"400000","o":"F3E207"},{"b":"400000","o":"97E61C"},{"b":"400000","o":"97C3A8"},{"b":"400000","o":"67DC5D"},{"b":"400000","o":"3F57A2"},{"b":"400000","o":"3FA739"},{"b":"7F1CC184D000","o":"22445"},{"b":"400000","o":"3F3549"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" :

{ "sysname" : "Linux", "release" : "3.10.0-693.el7.x86_64", "version" : "#1 SMP Tue Aug 22 21:09:27 UTC 2017", "machine" : "x86_64" }

, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "7F7AA372EDE22BA34234ADA10A8AF2E665681140" }, { "b" : "7FFFF8CC1000", "elfType" : 3, "buildId" : "7FB8E16CEA1B913E2703A6E4159FB468CD1E3507" }, { "b" : "7F1CC2F18000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "F4C04BCE85D2D269D0A2AF4972FC69805B50345B" }, { "b" : "7F1CC2CA6000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "ED0AC7DEB91A242C194B3DEF27A215F41CE43116" }, { "b" : "7F1CC2845000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "BC0AE9CA0705BEC1F0C0375AAD839843BB219CB1" }, { "b" : "7F1CC263D000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "D33989EC31EFE745EB0D3B68A92D19E77D7DDFDA" }, { "b" : "7F1CC2439000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "5CDB5A56336E7E2BD14FFA189411E44A834AFCD8" }, { "b" : "7F1CC2132000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "9589AE0FDA6AEB1183EBA1C62A328F933E7817FD" }, { "b" : "7F1CC1E30000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "F4CAE74047F9AA2D5A71FDEC67C4285D75753EBA" }, { "b" : "7F1CC1C1A000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "531AA1391EA4E1489D5EF11AA5DC2FFD9E2BDFEE" }, { "b" : "7F1CC184D000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "CB4B7554D1ADBEF2F001142DD6F0A5139FC9AA69" }, { "b" : "7F1CC3134000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "D266B1F6650927E18108323BCCA8F7B68E68EB92" }, { "b" : "7F1CC1600000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DA322D74F55A0C4293085371A8D0E94B5962F5E7" }, { "b" : "7F1CC1318000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "B69E63024D408E400401EEA6815317BDA38FB7C2" }, { "b" : "7F1CC1114000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "A3832734347DCA522438308C9F08F45524C65C9B" }, { "b" : "7F1CC0EE1000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "A48639BF901DB554479BFAD114CB354CF63D7D6E" }, { "b" : "7F1CC0CCB000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7F1CC0ABD000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "6FDF5B013FD2739D304CFB9D723DCBC149EE03C9" }, { "b" : "7F1CC08B9000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F1CC06A0000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "2BDC2B6FF0B2C204CCE34D139A9EADA0272EB070" }, { "b" : "7F1CC0479000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "A88379F56A51950A33198890D37F5F8AEE71F8B4" }, { "b" : "7F1CC0217000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "9CA3D11F018BEEB719CDB34BE800BF1641350D0A" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf75549]
mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf12271]
mongod(_ZN5mongo13fassertFailedEi+0x61) [0xef60f1]
mongod(+0x993F8A) [0xd93f8a]
mongod(__wt_eventv+0x489) [0x13ab369]
mongod(__wt_err+0x95) [0x13ab525]
mongod(__wt_panic+0x24) [0x13ab9c4]
mongod(__wt_bm_read+0x72) [0x12fe3d2]
mongod(__wt_bt_read+0x1AC) [0x131811c]
mongod(__wt_btree_tree_open+0x58) [0x1315ef8]
mongod(__wt_btree_open+0xD02) [0x1316d52]
mongod(__wt_conn_btree_get+0x19B) [0x1340b4b]
mongod(__wt_session_get_btree+0x41B) [0x13aa50b]
mongod(__wt_metadata_open+0x2B) [0x13780ab]
mongod(wiredtiger_open+0xCD7) [0x133e207]
mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x30C) [0xd7e61c]
mongod(+0x97C3A8) [0xd7c3a8]
mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa7dc5d]
mongod(_ZN5mongo13initAndListenEi+0x422) [0x7f57a2]
mongod(main+0x139) [0x7fa739]
libc.so.6(__libc_start_main+0xF5) [0x7f1cc186f445]
mongod(+0x3F3549) [0x7f3549]
----- END BACKTRACE -----
2018-10-25T10:17:15.043+0800 I - [initandlisten]

***aborting after fassert() failure



 Comments   
Comment by hexiaojie [ 06/Mar/19 ]

I have the same problem,can you help me repair the file or share some methods to repair the file.WiredTiger.turtleWiredTiger.wt

Comment by hanyudong [ 01/Nov/18 ]

Hi Kelsey,

        Thank you for Kelsey's answer, and then I'll follow your guidelines to prevent the next unfortunate incident. Thank you again for Kelsey's answer.

Kind regards,
hanyudong

Comment by Kelsey Schubert [ 01/Nov/18 ]

Hi hanyd,

Yes, it appears that other files (in addition to one originally provided) are corrupted likely as the result of disk failure. Unfortunately, in cases like this, we cannot attempt repair the the damaged files.

Kind regards,
Kelsey

Comment by hanyudong [ 31/Oct/18 ]

Hi Kelsey,
        My MongoDB is in single node mode. This is the file I backed up earlier. It was wrong when I restored the server. I have only one wrong backup file now. Is it totally impossible to solve it?

Thank you,
hanyudong

Comment by Kelsey Schubert [ 30/Oct/18 ]

Hi hanyd,

Unfortunately, this error indicates that there was corruption on the disk, most often caused by a faulty storage layer beneath mongod. In this situation, our best recommendation would be to resync the affected node or restore from a backup if possible.

To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.

Thank you,
Kelsey

Comment by hanyudong [ 30/Oct/18 ]

Hi Kelsey,

      If you need any more log files, please specify the path name of the file and so on, I will find and upload them

Thank you,

hanyudong

Comment by hanyudong [ 30/Oct/18 ]

Hi Kelsey,

       I still reported an error when I replaced the file you submitted and executed the startup command, and the relevant screenshots and logs were submitted to 20181030.tar.gz

Thank you,

hanyudong

Comment by Kelsey Schubert [ 26/Oct/18 ]

Hi hanyd,

Thank you for your report. I've attached a repair attempt, repair_attempt.tar.gz, of the files you provided. Please extract these files and replace them in your $dbpath and let us know if it resolves the issue. If you are still seeing errors after replacing these files, please provide the complete logs from the affected node so that we can further investigate.

Thank you,
Kelsey

Comment by hanyudong [ 25/Oct/18 ]

The above description is my most complete startup log.

 

 

For more details, please see the attachment.

Generated at Thu Feb 08 04:46:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.