[SERVER-20599] mongod start failed after unclean shutdown Created: 24/Sep/15  Updated: 09/Dec/15  Resolved: 09/Dec/15

Status: Closed
Project: Core Server
Component/s: Admin, WiredTiger
Affects Version/s: 3.0.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: naveen Assignee: Susan LoVerso
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.basecfg     File WiredTiger.turtle     File WiredTiger.wt     File _mdb_catalog.wt     File newfiles.tgz     File sizeStorer.wt     File storage.bson    
Operating System: ALL
Steps To Reproduce:

try repair but not working

Participants:

 Description   

mongodb start failed.

2015-09-24T11:25:24.047+0530 I CONTROL  ***** SERVER RESTARTED *****
2015-09-24T11:25:24.072+0530 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3G,session_max=20000,eviction=(threads_max=4),statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-09-24T11:25:25.528+0530 E STORAGE  [initandlisten] WiredTiger (0) [1443074125:528380][4951:0x7fdb932a8c80], file:sizeStorer.wt, session.open_cursor: read checksum error for 4096B block at offset 12288: block header checksum of 1702521171 doesn't match expected checksum of 1011620257
2015-09-24T11:25:25.531+0530 E STORAGE  [initandlisten] WiredTiger (0) [1443074125:528469][4951:0x7fdb932a8c80], file:sizeStorer.wt, session.open_cursor: sizeStorer.wt: encountered an illegal file format or internal value
2015-09-24T11:25:25.531+0530 E STORAGE  [initandlisten] WiredTiger (-31804) [1443074125:531883][4951:0x7fdb932a8c80], file:sizeStorer.wt, session.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-09-24T11:25:25.531+0530 I -        [initandlisten] Fatal Assertion 28558
2015-09-24T11:25:26.309+0530 I CONTROL  [initandlisten] 
 0xf5c569 0xefb431 0xedffb1 0xd8620a 0x138fe69 0x1390025 0x13904c4 0x12e081e 0x12e0d83 0x12ddb13 0x12e1aa6 0x12fb601 0x132564b 0x138f00b 0x138f49a 0x13329a1 0x138d0fa 0x13440e3 0x138cf41 0x138d355 0xd82099 0xd715a6 0xd6f258 0xa7dd5d 0x808872 0x7d6bc9 0x7fdb9186caf5 0x806689
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5C569"},{"b":"400000","o":"AFB431"},{"b":"400000","o":"ADFFB1"},{"b":"400000","o":"98620A"},{"b":"400000","o":"F8FE69"},{"b":"400000","o":"F90025"},{"b":"400000","o":"F904C4"},{"b":"400000","o":"EE081E"},{"b":"400000","o":"EE0D83"},{"b":"400000","o":"EDDB13"},{"b":"400000","o":"EE1AA6"},{"b":"400000","o":"EFB601"},{"b":"400000","o":"F2564B"},{"b":"400000","o":"F8F00B"},{"b":"400000","o":"F8F49A"},{"b":"400000","o":"F329A1"},{"b":"400000","o":"F8D0FA"},{"b":"400000","o":"F440E3"},{"b":"400000","o":"F8CF41"},{"b":"400000","o":"F8D355"},{"b":"400000","o":"982099"},{"b":"400000","o":"9715A6"},{"b":"400000","o":"96F258"},{"b":"400000","o":"67DD5D"},{"b":"400000","o":"408872"},{"b":"400000","o":"3D6BC9"},{"b":"7FDB9184B000","o":"21AF5"},{"b":"400000","o":"406689"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "3.10.0-229.11.1.el7.x86_64", "version" : "#1 SMP Thu Aug 6 01:06:18 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "2C8EE1A0F536BE67BDBA4481FE55CEEE8AA46950" }, { "b" : "7FFC63EFE000", "elfType" : 3, "buildId" : "3D2E7F6E0FC5432E542D442752748EF6F5A3BA17" }, { "b" : "7FDB92E8B000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "12F30315D4F4A2FE58B1977405C8B5515861E66B" }, { "b" : "7FDB92C1E000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "BB96EE99138B19FECDAB55E80A1728B648ECAD50" }, { "b" : "7FDB92837000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "B154203FB7C05AEE29D5D6F6C000305191209FE4" }, { "b" : "7FDB9262F000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "7376A07360DC57189A8F92B20AA4AA1CAEA80551" }, { "b" : "7FDB9242B000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "4DFEE4EA9AE8FDD4C71BD4CCC0727222F19DF810" }, { "b" : "7FDB92124000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "405EACD649720B8668FFBBA197CBF030A7EF6296" }, { "b" : "7FDB91E22000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "A1AA62B29765BE03A36BF927B047EEEF8696EEC6" }, { "b" : "7FDB91C0C000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "5D3D7256AE68BCFF41E312A24825ED80ECA88A73" }, { "b" : "7FDB9184B000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "C31FFE7942BFD77B2FCA8F9BD5709D387A86D3BC" }, { "b" : "7FDB930A7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9866E1D2BA61EBB4CE4F009FACDAACC24EF3B804" }, { "b" : "7FDB915FF000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "34672D541C8C9C5C1C25CB4F3F332CC9D3E604AD" }, { "b" : "7FDB9131C000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "45CB7F6CD322F5B55FF8B635F7EC1578631CCAEA" }, { "b" : "7FDB91118000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "3A1166709F88740C49E060731832E3FAD2DFB66B" }, { "b" : "7FDB90EE6000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "23A2D854538903E2B84EF0882046DD95522C8B59" }, { "b" : "7FDB90CD0000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "E45643F27F3B3E960F3691AFC6EC27A98EF7B46B" }, { "b" : "7FDB90AC1000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "F4A3D5E7E23F871751CA8F250421F8CF83447AD2" }, { "b" : "7FDB908BD000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7FDB906A3000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "AC596E865AF0D14B10F7B707F47D2031AD6D68DC" }, { "b" : "7FDB9047E000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "82FF6B18E1E42825CC2D060F969479AD4AF2F62C" }, { "b" : "7FDB9021D000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "298B19C64B19995F2AA4DA7B852E90BA5302F630" }, { "b" : "7FDB8FFF8000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "218D03D1F6CF1A099A4D467B5E8ECF4F2BF45750" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5c569]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefb431]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xedffb1]
 mongod(+0x98620A) [0xd8620a]
 mongod(__wt_eventv+0x489) [0x138fe69]
 mongod(__wt_err+0x95) [0x1390025]
 mongod(__wt_panic+0x24) [0x13904c4]
 mongod(__wt_block_extlist_read+0x6E) [0x12e081e]
 mongod(__wt_block_extlist_read_avail+0x33) [0x12e0d83]
 mongod(__wt_block_checkpoint_load+0x193) [0x12ddb13]
 mongod(+0xEE1AA6) [0x12e1aa6]
 mongod(__wt_btree_open+0xAB1) [0x12fb601]
 mongod(__wt_conn_btree_get+0x19B) [0x132564b]
 mongod(__wt_session_get_btree+0x41B) [0x138f00b]
 mongod(__wt_session_get_btree_ckpt+0xBA) [0x138f49a]
 mongod(__wt_curfile_open+0xE1) [0x13329a1]
 mongod(__wt_open_cursor+0x26A) [0x138d0fa]
 mongod(__wt_curtable_open+0x2E3) [0x13440e3]
 mongod(__wt_open_cursor+0xB1) [0x138cf41]
 mongod(+0xF8D355) [0x138d355]
 mongod(_ZN5mongo20WiredTigerSizeStorerC1EP15__wt_connectionRKSs+0xA9) [0xd82099]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x546) [0xd715a6]
 mongod(+0x96F258) [0xd6f258]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa7dd5d]
 mongod(_ZN5mongo13initAndListenEi+0x422) [0x808872]
 mongod(main+0x139) [0x7d6bc9]
 libc.so.6(__libc_start_main+0xF5) [0x7fdb9186caf5]
 mongod(+0x406689) [0x806689]
-----  END BACKTRACE  -----
2015-09-24T11:25:26.309+0530 I -        [initandlisten] 
 
***aborting after fassert() failure

I am having worst experience with mongodb and wiredTiger, wired tiger files got corrupted very frequently on unusual shut down of system, repair process not working, and u people are not telling what is the process process of repairing wired tiger files. It is such a bad experience of using a storage engine.

I think I should switch to MMap, that is comparetively better than this.



 Comments   
Comment by Ramon Fernandez Marina [ 20/Nov/15 ]

naven@123, have you tried using the new files I uploaded last week? Any improvements?

Thanks,
Ramón.

Comment by Ramon Fernandez Marina [ 20/Nov/15 ]

toni44, can you please open a separate ticket so we can investigate your case? Even if the symptoms are the same the cause and fix may be different. Please upload the WiredTiger.wt and WiredTiger.turtle to that ticket.

Thanks,
Ramón.

Comment by toni [ 14/Nov/15 ]

i have the same problem, unluckily all three config servers are affected, in a sharded environment. a power failure is the cause..
any way to recover/repair the files? i start with --nojournal btw. i could also import a backup.. but its some days old, so id miss some data.. repair would me my preferrence.

Comment by Ramon Fernandez Marina [ 13/Nov/15 ]

I'm attaching a tarball with the result of a repair attempt for the WiredTiger.wt and WiredTiger.turtle files in case it helps recovering from this situation. There seem to be other issues with this particular corruption that we're still investigating, apologies is taking some time.

This being said, I'd strongly recommend the implementation of protection measures against unclean shutdowns to avoid any further issues. Like with any other system, data loss or corruption can happen in the case of an unclean shutdown if the filesystem is left in an inconsistent state.

Comment by naveen [ 24/Sep/15 ]

electricity failure is unusaual shutdown, there may be some issue with wiredtiger.turtle or wiredtiger.wt or sizestored.wt,
if u recover therse files then i will be able to recover database. I had the same problem occurred last time, u give me .turtle and .wt files , then i was able to successfully repair database

Comment by Ramon Fernandez Marina [ 24/Sep/15 ]

naven@123, can you elaborate on what "unusual shutdown" means? The reason I asked for the logs before the restart is to try to understand what led to the shutdown, which is the first step in understanding what may have led to the current situation and the next step forward.

Thanks,
Ramón.

Comment by naveen [ 24/Sep/15 ]

this all happened due to unusual shut down

Comment by naveen [ 24/Sep/15 ]

this is repair log

[root@nsg-static-172 ~]# mongod --repair --dbpath /home/wiredTiger/traidata --storageEngine wiredTiger
2015-09-24T19:57:17.463+0530 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=3G,session_max=20000,eviction=(threads_max=4),statistics=(fast),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2015-09-24T19:57:17.597+0530 E STORAGE  [initandlisten] WiredTiger (0) [1443104837:597285][10365:0x7f923665ec80], file:WiredTiger.wt, cursor.search_near: read checksum error for 24576B block at offset 102400: block header checksum of 609464209 doesn't match expected checksum of 3275007603
2015-09-24T19:57:17.597+0530 E STORAGE  [initandlisten] WiredTiger (0) [1443104837:597379][10365:0x7f923665ec80], file:WiredTiger.wt, cursor.search_near: WiredTiger.wt: encountered an illegal file format or internal value
2015-09-24T19:57:17.597+0530 E STORAGE  [initandlisten] WiredTiger (-31804) [1443104837:597413][10365:0x7f923665ec80], file:WiredTiger.wt, cursor.search_near: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-09-24T19:57:17.597+0530 I -        [initandlisten] Fatal Assertion 28558
2015-09-24T19:57:17.614+0530 I CONTROL  [initandlisten] 
 0xf5c569 0xefb431 0xedffb1 0xd8620a 0x138fe69 0x1390025 0x13904c4 0x12e2ed2 0x12fcc1c 0x1301799 0x12fe7f3 0x131a3b0 0x12f10cb 0x13316ba 0x139bd86 0x139c3c8 0x1328d21 0x13231e3 0xd7136c 0xd6f258 0xa7dd5d 0x808872 0x7d6bc9 0x7f9234c22af5 0x806689
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5C569"},{"b":"400000","o":"AFB431"},{"b":"400000","o":"ADFFB1"},{"b":"400000","o":"98620A"},{"b":"400000","o":"F8FE69"},{"b":"400000","o":"F90025"},{"b":"400000","o":"F904C4"},{"b":"400000","o":"EE2ED2"},{"b":"400000","o":"EFCC1C"},{"b":"400000","o":"F01799"},{"b":"400000","o":"EFE7F3"},{"b":"400000","o":"F1A3B0"},{"b":"400000","o":"EF10CB"},{"b":"400000","o":"F316BA"},{"b":"400000","o":"F9BD86"},{"b":"400000","o":"F9C3C8"},{"b":"400000","o":"F28D21"},{"b":"400000","o":"F231E3"},{"b":"400000","o":"97136C"},{"b":"400000","o":"96F258"},{"b":"400000","o":"67DD5D"},{"b":"400000","o":"408872"},{"b":"400000","o":"3D6BC9"},{"b":"7F9234C01000","o":"21AF5"},{"b":"400000","o":"406689"}],"processInfo":{ "mongodbVersion" : "3.0.6", "gitVersion" : "1ef45a23a4c5e3480ac919b28afcba3c615488f2", "uname" : { "sysname" : "Linux", "release" : "3.10.0-229.11.1.el7.x86_64", "version" : "#1 SMP Thu Aug 6 01:06:18 UTC 2015", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "2C8EE1A0F536BE67BDBA4481FE55CEEE8AA46950" }, { "b" : "7FFF03EFE000", "elfType" : 3, "buildId" : "3D2E7F6E0FC5432E542D442752748EF6F5A3BA17" }, { "b" : "7F9236241000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "12F30315D4F4A2FE58B1977405C8B5515861E66B" }, { "b" : "7F9235FD4000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "BB96EE99138B19FECDAB55E80A1728B648ECAD50" }, { "b" : "7F9235BED000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "B154203FB7C05AEE29D5D6F6C000305191209FE4" }, { "b" : "7F92359E5000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "7376A07360DC57189A8F92B20AA4AA1CAEA80551" }, { "b" : "7F92357E1000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "4DFEE4EA9AE8FDD4C71BD4CCC0727222F19DF810" }, { "b" : "7F92354DA000", "path" : "/lib64/libstdc++.so.6", "elfType" : 3, "buildId" : "405EACD649720B8668FFBBA197CBF030A7EF6296" }, { "b" : "7F92351D8000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "A1AA62B29765BE03A36BF927B047EEEF8696EEC6" }, { "b" : "7F9234FC2000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "5D3D7256AE68BCFF41E312A24825ED80ECA88A73" }, { "b" : "7F9234C01000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "C31FFE7942BFD77B2FCA8F9BD5709D387A86D3BC" }, { "b" : "7F923645D000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9866E1D2BA61EBB4CE4F009FACDAACC24EF3B804" }, { "b" : "7F92349B5000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "34672D541C8C9C5C1C25CB4F3F332CC9D3E604AD" }, { "b" : "7F92346D2000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "45CB7F6CD322F5B55FF8B635F7EC1578631CCAEA" }, { "b" : "7F92344CE000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "3A1166709F88740C49E060731832E3FAD2DFB66B" }, { "b" : "7F923429C000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "23A2D854538903E2B84EF0882046DD95522C8B59" }, { "b" : "7F9234086000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "E45643F27F3B3E960F3691AFC6EC27A98EF7B46B" }, { "b" : "7F9233E77000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "F4A3D5E7E23F871751CA8F250421F8CF83447AD2" }, { "b" : "7F9233C73000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7F9233A59000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "AC596E865AF0D14B10F7B707F47D2031AD6D68DC" }, { "b" : "7F9233834000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "82FF6B18E1E42825CC2D060F969479AD4AF2F62C" }, { "b" : "7F92335D3000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "298B19C64B19995F2AA4DA7B852E90BA5302F630" }, { "b" : "7F92333AE000", "path" : "/lib64/liblzma.so.5", "elfType" : 3, "buildId" : "218D03D1F6CF1A099A4D467B5E8ECF4F2BF45750" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5c569]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefb431]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xedffb1]
 mongod(+0x98620A) [0xd8620a]
 mongod(__wt_eventv+0x489) [0x138fe69]
 mongod(__wt_err+0x95) [0x1390025]
 mongod(__wt_panic+0x24) [0x13904c4]
 mongod(__wt_bm_read+0x72) [0x12e2ed2]
 mongod(__wt_bt_read+0x1AC) [0x12fcc1c]
 mongod(__wt_cache_read+0x99) [0x1301799]
 mongod(__wt_page_in_func+0x3F3) [0x12fe7f3]
 mongod(__wt_row_search+0xA50) [0x131a3b0]
 mongod(__wt_btcur_search_near+0x8BB) [0x12f10cb]
 mongod(+0xF316BA) [0x13316ba]
 mongod(+0xF9BD86) [0x139bd86]
 mongod(__wt_txn_recover+0x3E8) [0x139c3c8]
 mongod(__wt_connection_workers+0x61) [0x1328d21]
 mongod(wiredtiger_open+0x11B3) [0x13231e3]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKSsS2_bb+0x30C) [0xd7136c]
 mongod(+0x96F258) [0xd6f258]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa7dd5d]
 mongod(_ZN5mongo13initAndListenEi+0x422) [0x808872]
 mongod(main+0x139) [0x7d6bc9]
 libc.so.6(__libc_start_main+0xF5) [0x7f9234c22af5]
 mongod(+0x406689) [0x806689]
-----  END BACKTRACE  -----
2015-09-24T19:57:17.615+0530 I -        [initandlisten] 
 
***aborting after fassert() failure

Comment by Ramon Fernandez Marina [ 24/Sep/15 ]

Sorry you've run into this issue naven@123. Can you please send us full logs from this server? I hope that seeing what's in the logs before the restart that you copied above we'll have a better understanding of the sequence of events that led to this situation.

Also, have you tried starting mongod with the --repair option? If you have (or if you do), can you please uploads the logs for that as well?

Thanks,
Ramón.

Generated at Thu Feb 08 03:54:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.