[SERVER-31196] WiredTiger.wt: encountered an illegal file format or internal value Created: 21/Sep/17  Updated: 27/Jul/18  Resolved: 04/Jan/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.14
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Nelson Mamani Assignee: Keith Bostic (Inactive)
Resolution: Cannot Reproduce Votes: 0
Labels: envh, rpo, rps, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repair_attempt.tar.gz     Zip Archive wiredTigerFiles.zip    
Backwards Compatibility: Fully Compatible
Operating System: Linux
Sprint: Storage 2017-10-23, Storage 2017-11-13, Storage 2017-12-04, Storage 2017-12-18
Participants:

 Description   

Hello, due to an outage my mongodb server detected an unclean shutdown and then tried to recover the data using wiredtiger storage engine but it raised the error described below.

Im pretty sure I have the same issue as SERVER-26103, so I attached the wiredtiger files.

2017-09-21T02:01:59.025+0000 I CONTROL  [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=mongo2
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] db version v3.2.14
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] git version: 92f6668a768ebf294bd4f494c50f48459198e6a3
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.1t  3 May 2016
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] modules: none
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] build environment:
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten]     distmod: debian81
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-09-21T02:01:59.026+0000 I CONTROL  [initandlisten] options: {}
2017-09-21T02:01:59.092+0000 I -        [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-09-21T02:01:59.093+0000 W -        [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2017-09-21T02:01:59.093+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-09-21T02:01:59.093+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-09-21T02:01:59.099+0000 E STORAGE  [initandlisten] WiredTiger (0) [1505959319:99774][1:0x7f5b86ae7cc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 69632: block header checksum of 2855018717 doesn't match expected checksum of 1809832460
2017-09-21T02:01:59.106+0000 E STORAGE  [initandlisten] WiredTiger (0) [1505959319:106232][1:0x7f5b86ae7cc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2017-09-21T02:01:59.106+0000 E STORAGE  [initandlisten] WiredTiger (-31804) [1505959319:106471][1:0x7f5b86ae7cc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-09-21T02:01:59.106+0000 I -        [initandlisten] Fatal Assertion 28558
2017-09-21T02:01:59.106+0000 I -        [initandlisten]
 
***aborting after fassert() failure
 
 
2017-09-21T02:01:59.347+0000 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0x134d5b2 0x134c6e9 0x134cef2 0x7f5b85743890 0x7f5b853be067 0x7f5b853bf448 0x12cdf12 0x10c4213 0x975dd9 0x975fd6 0x97619a 0x19e233c 0x19e28a3 0x19df139 0x19e3b39 0x19fcd5c 0x1a37053 0x1abcf7d 0x1abd499 0x1abd6bb 0x1a44936 0x1ab9ead 0x1a7e4ef 0x1a7e5ee 0x1a338aa 0x10ab3af 0x10a7603 0xfcf73e 0x9c0490 0x976aad 0x7f5b853aab45 0x9bbe67
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F4D5B2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"F4C6E9"},{"b":"400000","o":"F4CEF2"},{"b":"7F5B85734000","o":"F890"},{"b":"7F5B85389000","o":"35067","s":"gsignal"},{"b":"7F5B85389000","o":"36448","s":"abort"},{"b":"400000","o":"ECDF12","s":"_ZN5mongo13fassertFailedEi"},{"b":"400000","o":"CC4213"},{"b":"400000","o":"575DD9","s":"__wt_eventv"},{"b":"400000","o":"575FD6","s":"__wt_err"},{"b":"400000","o":"57619A","s":"__wt_panic"},{"b":"400000","o":"15E233C","s":"__wt_block_extlist_read"},{"b":"400000","o":"15E28A3","s":"__wt_block_extlist_read_avail"},{"b":"400000","o":"15DF139","s":"__wt_block_checkpoint_load"},{"b":"400000","o":"15E3B39"},{"b":"400000","o":"15FCD5C","s":"__wt_btree_open"},{"b":"400000","o":"1637053","s":"__wt_conn_btree_open"},{"b":"400000","o":"16BCF7D","s":"__wt_session_get_btree"},{"b":"400000","o":"16BD499","s":"__wt_session_get_btree"},{"b":"400000","o":"16BD6BB","s":"__wt_session_get_btree_ckpt"},{"b":"400000","o":"1644936","s":"__wt_curfile_open"},{"b":"400000","o":"16B9EAD"},{"b":"400000","o":"167E4EF","s":"__wt_metadata_cursor_open"},{"b":"400000","o":"167E5EE","s":"__wt_metadata_cursor"},{"b":"400000","o":"16338AA","s":"wiredtiger_open"},{"b":"400000","o":"CAB3AF","s":"_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb"},{"b":"400000","o":"CA7603"},{"b":"400000","o":"BCF73E","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"5C0490"},{"b":"400000","o":"576AAD","s":"main"},{"b":"7F5B85389000","o":"21B45","s":"__libc_start_main"},{"b":"400000","o":"5BBE67"}],"processInfo":{ "mongodbVersion" : "3.2.14", "gitVersion" : "92f6668a768ebf294bd4f494c50f48459198e6a3", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.13.0-119-generic", "version" : "#166-Ubuntu SMP Wed May 3 12:18:55 UTC 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "056AF96295E28A047B8792D4BA955B26F5D681B2" }, { "b" : "7FFCB980F000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "4C17ABAB67692D0724DAC4F24BF42AC7A8A36399" }, { "b" : "7F5B86670000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "21115992A1F885E1ACE88AADA60F126AD9759D03" }, { "b" : "7F5B86274000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "32E9A5B9EED626E93DEEB00A49033F78652DB9A3" }, { "b" : "7F5B8606C000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "A63C95FB33CCA970E141D2E13774B997C1CF0565" }, { "b" : "7F5B85E68000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D70B531D672A34D71DB42EB32B68E63F2DCC5B6A" }, { "b" : "7F5B85B67000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "152C93BA3E8590F7ED0BCDDF868600D55EC4DD6F" }, { "b" : "7F5B85951000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "D5FB04F64B3DAEA6D6B68B5E8B9D4D2BC1A6E1FC" }, { "b" : "7F5B85734000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9DA9387A60FFC196AEDB9526275552AFEF499C44" }, { "b" : "7F5B85389000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "48C48BC6ABB794461B8A558DD76B29876A0551F0" }, { "b" : "7F5B868D1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "1D98D41FBB1EABA7EC05D0FD7624B85D6F51C03C" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x134d5b2]
 mongod(+0xF4C6E9) [0x134c6e9]
 mongod(+0xF4CEF2) [0x134cef2]
 libpthread.so.0(+0xF890) [0x7f5b85743890]
 libc.so.6(gsignal+0x37) [0x7f5b853be067]
 libc.so.6(abort+0x148) [0x7f5b853bf448]
 mongod(_ZN5mongo13fassertFailedEi+0x82) [0x12cdf12]
 mongod(+0xCC4213) [0x10c4213]
 mongod(__wt_eventv+0x440) [0x975dd9]
 mongod(__wt_err+0x8D) [0x975fd6]
 mongod(__wt_panic+0x24) [0x97619a]
 mongod(__wt_block_extlist_read+0x6C) [0x19e233c]
 mongod(__wt_block_extlist_read_avail+0x33) [0x19e28a3]
 mongod(__wt_block_checkpoint_load+0x359) [0x19df139]
 mongod(+0x15E3B39) [0x19e3b39]
 mongod(__wt_btree_open+0xBDC) [0x19fcd5c]
 mongod(__wt_conn_btree_open+0x153) [0x1a37053]
 mongod(__wt_session_get_btree+0xDD) [0x1abcf7d]
 mongod(__wt_session_get_btree+0x5F9) [0x1abd499]
 mongod(__wt_session_get_btree_ckpt+0xAB) [0x1abd6bb]
 mongod(__wt_curfile_open+0x86) [0x1a44936]
 mongod(+0x16B9EAD) [0x1ab9ead]
 mongod(__wt_metadata_cursor_open+0x5F) [0x1a7e4ef]
 mongod(__wt_metadata_cursor+0x7E) [0x1a7e5ee]
 mongod(wiredtiger_open+0x191A) [0x1a338aa]
 mongod(_ZN5mongo18WiredTigerKVEngineC2ERKSsS2_S2_mbbb+0x77F) [0x10ab3af]
 mongod(+0xCA7603) [0x10a7603]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x59E) [0xfcf73e]
 mongod(+0x5C0490) [0x9c0490]
 mongod(main+0x15D) [0x976aad]
 libc.so.6(__libc_start_main+0xF5) [0x7f5b853aab45]
 mongod(+0x5BBE67) [0x9bbe67]
-----  END BACKTRACE  -----



 Comments   
Comment by Keith Bostic (Inactive) [ 04/Jan/18 ]

alexander.gorrod, kelsey.schubert, I'm going to close this one as "not reproducible".

Kelsey got the customer up and running, and there have been no further failures.

In summary, I believe there's a bug here, but I can't find anything in the data files pointing to a root cause, I can't reproduce it, and I can't think of any way to track it down.

Comment by Nelson Mamani [ 22/Sep/17 ]

Hi Kelsey,

I tried the files you sent me, and I was able to recover the data, so the repair was successfully completed.
Therse are the answers to your questions:
1. Yes it was a power outage
2. Storage devices are attached locally, and the disk is HDD
3. Last check Wed Sept 20 13:40:51 2017, and yes it's marked clean

I hope this information is helpful to you, thank you so much for the repaired files.

Comment by Kelsey Schubert [ 21/Sep/17 ]

Hi nelmapi,

Thank you for opening this ticket. I've attached a repair attempt of the files you've provided. Would you please replace these files in your dbpath and let us know if mongod successfully starts?

Additionally, I have a few questions so we can better understand what happened:

  1. Would you please confirm that the outage was caused by a power failure?
  2. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  3. When was the underlying filesystem last checked and is it currently marked clean?

Kind regards,
Kelsey

Generated at Thu Feb 08 04:26:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.