[SERVER-24410] Cannot start mongod or --repair because of WiredTiger.wt checksum error (due to unclean shutdown) Created: 06/Jun/16  Updated: 14/Jul/16  Resolved: 06/Jun/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.2, 3.2.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Andres Marin Assignee: Unassigned
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.basecfg     File WiredTiger.turtle     File WiredTiger.wt     File repair_attempt.tgz    
Operating System: ALL
Steps To Reproduce:

A failed propper shutdown apparently.

Participants:

 Description   

Hello!

We have a production server that has crashed unexpectedly and we can't get it to start because of a checksum error on WiredTiger.wt

The server has no replication so we need to fix this file.

Running:

sudo mongod --storageEngine wiredTiger --dbpath /data/wt2

It outputs:

2016-06-06T11:12:35.945+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2016-06-06T11:12:35.963+0000 E STORAGE  [initandlisten] WiredTiger (0) [1465211555:963206][5105:0x7f5767fd8c80], file:WiredTiger.wt, connection: read checksum error [4096B @ 65536, 712321661 != 761358435]
2016-06-06T11:12:35.963+0000 E STORAGE  [initandlisten] WiredTiger (0) [1465211555:963252][5105:0x7f5767fd8c80], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2016-06-06T11:12:35.963+0000 E STORAGE  [initandlisten] WiredTiger (-31804) [1465211555:963274][5105:0x7f5767fd8c80], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2016-06-06T11:12:35.963+0000 I -        [initandlisten] Fatal Assertion 28558
2016-06-06T11:12:35.973+0000 I CONTROL  [initandlisten] 
 0xf6b4a9 0xf0bca1 0xef0821 0xd9584a 0x1399979 0x1399b35 0x1399fd4 0x12ef27e 0x12ef718 0x12ecad3 0x12f0446 0x13086c1 0x133088b 0x1398ebd 0x136712b 0x132e077 0xd8024b 0xd7e248 0xa9ccbd 0x823b22 0x7f0784 0x7f5766592af5 0x8218f9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B6B4A9"},{"b":"400000","o":"B0BCA1"},{"b":"400000","o":"AF0821"},{"b":"400000","o":"99584A"},{"b":"400000","o":"F99979"},{"b":"400000","o":"F99B35"},{"b":"400000","o":"F99FD4"},{"b":"400000","o":"EEF27E"},{"b":"400000","o":"EEF718"},{"b":"400000","o":"EECAD3"},{"b":"400000","o":"EF0446"},{"b":"400000","o":"F086C1"},{"b":"400000","o":"F3088B"},{"b":"400000","o":"F98EBD"},{"b":"400000","o":"F6712B"},{"b":"400000","o":"F2E077"},{"b":"400000","o":"98024B"},{"b":"400000","o":"97E248"},{"b":"400000","o":"69CCBD"},{"b":"400000","o":"423B22"},{"b":"400000","o":"3F0784"},{"b":"7F5766571000","o":"21AF5"},{"b":"400000","o":"4218F9"}],"processInfo":{ "mongodbVersion" : "3.0.2", "gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f", "uname" : { "sysname" : "Linux", "release" : "3.14.20-20.44.amzn1.x86_64", "version" : "#1 SMP Mon Oct 6 22:52:46 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "810BAA4B0BDE1EBB076BF6A544BF2724A13FAE84" }, { "b" : "7FFF950FE000", "elfType" : 3, "buildId" : "539344C1ABDFD227F4B5BF86B780625CEAB090FA" }, { "b" : "7F5767BAC000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F576793F000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F711D67FF0C1FE2222FB003A30AB74DA26A5EF41" }, { "b" : "7F576755A000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "777069F5EECC26CD66C5C8390FA2BF4E444979D1" }, { "b" : "7F5767352000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F576714E000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F5766E4A000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F5766B48000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F5766932000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F5766571000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "A14FC690F08FB799BA8CC82D49DE9AA9D4580464" }, { "b" : "7F5767DC8000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F576632E000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "9DF61878D8918F25CC74AD01F417FDB051DFE3DA" }, { "b" : "7F5766049000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6F1DB0F811D1B210520443442D4437BC43BF9A80" }, { "b" : "7F5765E46000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "7D435F06640E70A09EA7C1A62D25CE3507435263" }, { "b" : "7F5765C1B000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "F7DF34078FD7BFD684FE46D5F677EEDA1D9B9DC9" }, { "b" : "7F5765A05000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "87B4EBF2183C8EA4AB657212203EFFE6340E2F4F" }, { "b" : "7F57657FA000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "381960ACAB9C39461D58BDE7B272C4F61BB3582F" }, { "b" : "7F57655F7000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F57653DD000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F57651BC000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "803D7EF21A989677D056E52BAEB9AB5B154FB9D9" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6b4a9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf0bca1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xef0821]
 mongod(+0x99584A) [0xd9584a]
 mongod(__wt_eventv+0x489) [0x1399979]
 mongod(__wt_err+0x95) [0x1399b35]
 mongod(__wt_panic+0x24) [0x1399fd4]
 mongod(__wt_block_extlist_read+0x6E) [0x12ef27e]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12ef718]
 mongod(__wt_block_checkpoint_load+0x193) [0x12ecad3]
 mongod(+0xEF0446) [0x12f0446]
 mongod(__wt_btree_open+0xAB1) [0x13086c1]
 mongod(__wt_conn_btree_get+0x19B) [0x133088b]
 mongod(__wt_session_get_btree+0x31D) [0x1398ebd]
 mongod(__wt_metadata_open+0x2B) [0x136712b]
 mongod(wiredtiger_open+0xCD7) [0x132e077]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x2EB) [0xd8024b]
 mongod(+0x97E248) [0xd7e248]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa9ccbd]
 mongod(_ZN5mongo13initAndListenEi+0x422) [0x823b22]
 mongod(main+0x134) [0x7f0784]
 libc.so.6(__libc_start_main+0xF5) [0x7f5766592af5]
 mongod(+0x4218F9) [0x8218f9]
-----  END BACKTRACE  -----
2016-06-06T11:12:35.973+0000 I -        [initandlisten] 
 
***aborting after fassert() failure

When attempt with --repair:

sudo mongod --storageEngine wiredTiger --dbpath /data/wt2 --repair

It returns:

2016-06-06T11:13:18.819+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7G,session_max=20000,eviction=(threads_max=4),statistics=(fast),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2016-06-06T11:13:18.836+0000 E STORAGE  [initandlisten] WiredTiger (0) [1465211598:836780][5108:0x7f3b39200c80], file:WiredTiger.wt, connection: read checksum error [4096B @ 65536, 712321661 != 761358435]
2016-06-06T11:13:18.836+0000 E STORAGE  [initandlisten] WiredTiger (0) [1465211598:836825][5108:0x7f3b39200c80], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2016-06-06T11:13:18.836+0000 E STORAGE  [initandlisten] WiredTiger (-31804) [1465211598:836847][5108:0x7f3b39200c80], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2016-06-06T11:13:18.836+0000 I -        [initandlisten] Fatal Assertion 28558
2016-06-06T11:13:18.846+0000 I CONTROL  [initandlisten] 
 0xf6b4a9 0xf0bca1 0xef0821 0xd9584a 0x1399979 0x1399b35 0x1399fd4 0x12ef27e 0x12ef718 0x12ecad3 0x12f0446 0x13086c1 0x133088b 0x1398ebd 0x136712b 0x132e077 0xd8024b 0xd7e248 0xa9ccbd 0x823b22 0x7f0784 0x7f3b377baaf5 0x8218f9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B6B4A9"},{"b":"400000","o":"B0BCA1"},{"b":"400000","o":"AF0821"},{"b":"400000","o":"99584A"},{"b":"400000","o":"F99979"},{"b":"400000","o":"F99B35"},{"b":"400000","o":"F99FD4"},{"b":"400000","o":"EEF27E"},{"b":"400000","o":"EEF718"},{"b":"400000","o":"EECAD3"},{"b":"400000","o":"EF0446"},{"b":"400000","o":"F086C1"},{"b":"400000","o":"F3088B"},{"b":"400000","o":"F98EBD"},{"b":"400000","o":"F6712B"},{"b":"400000","o":"F2E077"},{"b":"400000","o":"98024B"},{"b":"400000","o":"97E248"},{"b":"400000","o":"69CCBD"},{"b":"400000","o":"423B22"},{"b":"400000","o":"3F0784"},{"b":"7F3B37799000","o":"21AF5"},{"b":"400000","o":"4218F9"}],"processInfo":{ "mongodbVersion" : "3.0.2", "gitVersion" : "6201872043ecbbc0a4cc169b5482dcf385fc464f", "uname" : { "sysname" : "Linux", "release" : "3.14.20-20.44.amzn1.x86_64", "version" : "#1 SMP Mon Oct 6 22:52:46 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "810BAA4B0BDE1EBB076BF6A544BF2724A13FAE84" }, { "b" : "7FFF8C6FE000", "elfType" : 3, "buildId" : "539344C1ABDFD227F4B5BF86B780625CEAB090FA" }, { "b" : "7F3B38DD4000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "D48D3E6672A77B603B402F661BABF75E90AD570B" }, { "b" : "7F3B38B67000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "F711D67FF0C1FE2222FB003A30AB74DA26A5EF41" }, { "b" : "7F3B38782000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "777069F5EECC26CD66C5C8390FA2BF4E444979D1" }, { "b" : "7F3B3857A000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "E81013CBFA409053D58A65A0653271AB665A4619" }, { "b" : "7F3B38376000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "62A8842157C62F95C3069CBF779AFCC26577A99A" }, { "b" : "7F3B38072000", "path" : "/usr/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "DD6383EEAC49E9BAA9E3D1080AE932F42CF8A385" }, { "b" : "7F3B37D70000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "5F97F8F8E5024E29717CF35998681F84D4A22D45" }, { "b" : "7F3B37B5A000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "C52958E393BDF8E8D090F36DE0F4E620D8736FBF" }, { "b" : "7F3B37799000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "A14FC690F08FB799BA8CC82D49DE9AA9D4580464" }, { "b" : "7F3B38FF0000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "6F90843B9087FE91955FEB0355EB0858EF9E97B2" }, { "b" : "7F3B37556000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "9DF61878D8918F25CC74AD01F417FDB051DFE3DA" }, { "b" : "7F3B37271000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "6F1DB0F811D1B210520443442D4437BC43BF9A80" }, { "b" : "7F3B3706E000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "7D435F06640E70A09EA7C1A62D25CE3507435263" }, { "b" : "7F3B36E43000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "F7DF34078FD7BFD684FE46D5F677EEDA1D9B9DC9" }, { "b" : "7F3B36C2D000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "87B4EBF2183C8EA4AB657212203EFFE6340E2F4F" }, { "b" : "7F3B36A22000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "381960ACAB9C39461D58BDE7B272C4F61BB3582F" }, { "b" : "7F3B3681F000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b" : "7F3B36605000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "6A7DA1CED90F65F27CB7B5BACDBB1C386C05F592" }, { "b" : "7F3B363E4000", "path" : "/usr/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "803D7EF21A989677D056E52BAEB9AB5B154FB9D9" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf6b4a9]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf0bca1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xef0821]
 mongod(+0x99584A) [0xd9584a]
 mongod(__wt_eventv+0x489) [0x1399979]
 mongod(__wt_err+0x95) [0x1399b35]
 mongod(__wt_panic+0x24) [0x1399fd4]
 mongod(__wt_block_extlist_read+0x6E) [0x12ef27e]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12ef718]
 mongod(__wt_block_checkpoint_load+0x193) [0x12ecad3]
 mongod(+0xEF0446) [0x12f0446]
 mongod(__wt_btree_open+0xAB1) [0x13086c1]
 mongod(__wt_conn_btree_get+0x19B) [0x133088b]
 mongod(__wt_session_get_btree+0x31D) [0x1398ebd]
 mongod(__wt_metadata_open+0x2B) [0x136712b]
 mongod(wiredtiger_open+0xCD7) [0x132e077]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x2EB) [0xd8024b]
 mongod(+0x97E248) [0xd7e248]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa9ccbd]
 mongod(_ZN5mongo13initAndListenEi+0x422) [0x823b22]
 mongod(main+0x134) [0x7f0784]
 libc.so.6(__libc_start_main+0xF5) [0x7f3b377baaf5]
 mongod(+0x4218F9) [0x8218f9]
-----  END BACKTRACE  -----
2016-06-06T11:13:18.846+0000 I -        [initandlisten] 
 
***aborting after fassert() failure

We tried upgrading to 3.2 in a snapshot generated server but the results are the same.

I attach my WiredTiger files.

I've seen several other cases in Jira, but I undersand that each case required a specific Issue.

Thanks!
Andres



 Comments   
Comment by Ramon Fernandez Marina [ 06/Jun/16 ]

Glad to hear our repair attempt worked colorrin, I'm going to close this ticket then.

Unfortunately the current repair process is manual and not ready for prime time, that's what we have SERVER-19815 for, which we expect to complete for the upcoming stable version, MongoDB 3.4.

In addition to replication and backups, please make sure your storage layer has proper durability guarantees (as in fsync() works as advertised). If you run into this issue again feel free to open a new SERVER ticket.

Thanks,
Ramón.

Comment by Andres Marin [ 06/Jun/16 ]

It worked perfectly!
Thanks a lot Ramon! We are taking steps to make our environment more robust with proper replication.

Can you tell me what did you do to fix it, so I can do it myself in case I face this problem again?

Comment by Ramon Fernandez Marina [ 06/Jun/16 ]

colorrin, I've uploaded a file with the results of a repair attempt. Please extract this file on your dbpath and try again. Please note that:

  • there's no guarantee this repair attempt will succeed
  • it's not recommended to run mongod as root – the steps you took above may have changed the permissions on some files in your dbpath, which will cause problems later if you attempt to run MongoDB with the "mongodb" user
  • I'd recommend you consider running a replica set

SERVER-19815 is open to make the recovery/repair of WiredTiger instances more robust, feel free to watch it for updates and vote for it.

Regards,
Ramón

Generated at Thu Feb 08 04:06:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.