[SERVER-40207] sizeStorer.wt: encountered an illegal file format or internal value Created: 19/Mar/19  Updated: 25/Apr/19  Resolved: 25/Apr/19

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.6.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Jasmeet Singh Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     File _mdb_catalog.wt     File binaries_repair.log     File mongod (copy).log     File repair-attempt.log     File repair-attempt.tar     File sizeStorer.wt     File storage.bson    
Operating System: ALL
Participants:

 Description   

My db instance crashed and after multiple attempts, is crashing as soon as i start it.

This is the error that I'm getting as soon as i start the mongod service. Running mongod --repair is giving me the same error.

Starting mongodb with another blank dbpath works fine.

Below are the log messages:

 

2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] MongoDB starting : pid=12500 port=27017 dbpath=/var/lib/mongodb 64-bit host=Precision
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] db version v3.6.3
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] git version: 9586e557d54ef70f9ca4b43c26892cd55257e1a5
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1 14 Mar 2012
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] allocator: tcmalloc
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] modules: none
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] build environment:
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] distmod: ubuntu1204
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] distarch: x86_64
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] target_arch: x86_64
2019-03-19T15:53:23.693+0530 I CONTROL [initandlisten] options: { storage:

{ dbPath: "/var/lib/mongodb" }

}
2019-03-19T15:53:23.693+0530 I - [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-03-19T15:53:23.693+0530 I STORAGE [initandlisten]
2019-03-19T15:53:23.693+0530 I STORAGE [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2019-03-19T15:53:23.693+0530 I STORAGE [initandlisten] ** See http://dochub.mongodb.org/core/prodnotes-filesystem
2019-03-19T15:53:23.693+0530 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=7406M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-03-19T15:53:25.794+0530 I STORAGE [initandlisten] WiredTiger message [1552991005:794370][12500:0x7f135d5ec9c0], txn-recover: Main recovery loop: starting at 7589/17536
2019-03-19T15:53:25.794+0530 I STORAGE [initandlisten] WiredTiger message [1552991005:794720][12500:0x7f135d5ec9c0], txn-recover: Recovering log 7589 through 7621
2019-03-19T15:53:25.795+0530 E STORAGE [initandlisten] WiredTiger error (0) [1552991005:795220][12500:0x7f135d5ec9c0], file:sizeStorer.wt, WT_CURSOR.insert: read checksum error for 28672B block at offset 4096: block header checksum of 372562513 doesn't match expected checksum of 4033670318

2019-03-19T15:53:25.798+0530 E STORAGE [initandlisten] WiredTiger error (0) [1552991005:798474][12500:0x7f135d5ec9c0], file:sizeStorer.wt, WT_CURSOR.insert: sizeStorer.wt: encountered an illegal file format or internal value: (__wt_block_read_off, 302)
2019-03-19T15:53:25.798+0530 E STORAGE [initandlisten] WiredTiger error (-31804) [1552991005:798485][12500:0x7f135d5ec9c0], file:sizeStorer.wt, WT_CURSOR.insert: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-03-19T15:53:25.798+0530 F - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2019-03-19T15:53:25.798+0530 F - [initandlisten]

***aborting after fassert() failure

2019-03-19T15:53:25.813+0530 F - [initandlisten] Got signal: 6 (Aborted).

0x7f135f828241 0x7f135f827459 0x7f135f82793d 0x7f135c062cb0 0x7f135bcc8035 0x7f135bccb79b 0x7f135df96e97 0x7f135e06305e 0x7f135e0cd481 0x7f135df347c4 0x7f135df34aed 0x7f135e17e407 0x7f135e17e545 0x7f135e0f3a03 0x7f135e0fc4e5 0x7f135e1247d4 0x7f135e18ef30 0x7f135e13c7a5 0x7f135e0e42da 0x7f135e15f484 0x7f135e0e54b3 0x7f135e076e67 0x7f135e073b4c 0x7f135e046ff9 0x7f135e02b5b4 0x7f135e21c3b7 0x7f135df30687 0x7f135e00a78c 0x7f135df98b59 0x7f135bcb37ed 0x7f135dffa101
----- BEGIN BACKTRACE -----

{"backtrace":[\{"b":"7F135D60C000","o":"221C241","s":"_ZN5mongo15printStackTraceERSo"}

,{"b":"7F135D60C000","o":"221B459"},{"b":"7F135D60C000","o":"221B93D"},{"b":"7F135C053000","o":"FCB0"},{"b":"7F135BC92000","o":"36035","s":"gsignal"},{"b":"7F135BC92000","o":"3979B","s":"abort"},{"b":"7F135D60C000","o":"98AE97","s":"ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"7F135D60C000","o":"A5705E"},{"b":"7F135D60C000","o":"AC1481"},{"b":"7F135D60C000","o":"9287C4","s":"wt_err"},{"b":"7F135D60C000","o":"928AED","s":"wt_panic"},{"b":"7F135D60C000","o":"B72407","s":"wt_block_read_off"},{"b":"7F135D60C000","o":"B72545","s":"wt_bm_read"},{"b":"7F135D60C000","o":"AE7A03","s":"wt_bt_read"},{"b":"7F135D60C000","o":"AF04E5","s":"wt_page_in_func"},{"b":"7F135D60C000","o":"B187D4","s":"wt_row_search"},{"b":"7F135D60C000","o":"B82F30","s":"wt_btcur_insert"},{"b":"7F135D60C000","o":"B307A5"},{"b":"7F135D60C000","o":"AD82DA"},{"b":"7F135D60C000","o":"B53484","s":"wt_log_scan"},{"b":"7F135D60C000","o":"AD94B3","s":"wt_txn_recover"},{"b":"7F135D60C000","o":"A6AE67","s":"wt_connection_workers"},{"b":"7F135D60C000","o":"A67B4C","s":"wiredtiger_open"},{"b":"7F135D60C000","o":"A3AFF9","s":"_ZN5mongo18WiredTigerKVEngineC1ERKNSt7cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb"},{"b":"7F135D60C000","o":"A1F5B4"},{"b":"7F135D60C000","o":"C103B7","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"7F135D60C000","o":"924687"},{"b":"7F135D60C000","o":"9FE78C","s":"_ZN5mongo11mongoDbMainEiPPcS1"},{"b":"7F135D60C000","o":"98CB59","s":"main"},{"b":"7F135BC92000","o":"217ED","s":"__libc_start_main"},{"b":"7F135D60C000","o":"9EE101"}],"processInfo":{ "mongodbVersion" : "3.6.3", "gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5", "compiledModules" : [], "uname" :

{ "sysname" : "Linux", "release" : "3.13.0-117-generic", "version" : "#164~precise1-Ubuntu SMP Mon Apr 10 16:16:25 UTC 2017", "machine" : "x86_64" }

, "somap" : [ { "b" : "7F135D60C000", "elfType" : 3, "buildId" : "166D74EB01FC4592F997D1371620CDE159DB6385" }, { "b" : "7FFF937AD000", "elfType" : 3, "buildId" : "D6651212526D356B7DD21A1B18E9D77F28213C37" }, { "b" : "7F135D1CB000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "6635AFE7D6B2477093DA1FD5871D088D114C7878" }, { "b" : "7F135CF6C000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "22D860BB3077F41DB96710ED896AD4861EF5E1D0" }, { "b" : "7F135CB8F000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "808FB60BC32604C94BDA151A1575EC63CD8A1C1C" }, { "b" : "7F135C987000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "99255CAB5455CB9EDFA0270CDAA6B6A7BBEF2E1B" }, { "b" : "7F135C783000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "A51A5921F4E05E4D20B165D398BA4D563960DA9A" }, { "b" : "7F135C487000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "6D3D633C88F7E9835D180ACE648CEDB21C8021B7" }, { "b" : "7F135C270000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "614C86949BD6361040F6DB0CF3F1F1051AB73F96" }, { "b" : "7F135C053000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9B1F69F5DC3A6820BB3CA4B2DB147ABAA486A41A" }, { "b" : "7F135BC92000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "68FC0E76A868E47807E3604B02D8BAA580A4E2CB" }, { "b" : "7F135D3E7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "2B91CD40CE35626DAB827FEEE08F671253FA7B88" }, { "b" : "7F135BA7B000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "F695ECFCF3918D5D34989398A14B7ECDD9F46CD0" } ] }}
mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x7f135f828241]
mongod(+0x221B459) [0x7f135f827459]
mongod(+0x221B93D) [0x7f135f82793d]
libpthread.so.0(+0xFCB0) [0x7f135c062cb0]
libc.so.6(gsignal+0x35) [0x7f135bcc8035]
libc.so.6(abort+0x17B) [0x7f135bccb79b]
mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x7f135df96e97]
mongod(+0xA5705E) [0x7f135e06305e]
mongod(+0xAC1481) [0x7f135e0cd481]
mongod(__wt_err+0x9D) [0x7f135df347c4]
mongod(__wt_panic+0x33) [0x7f135df34aed]
mongod(__wt_block_read_off+0x547) [0x7f135e17e407]
mongod(__wt_bm_read+0x135) [0x7f135e17e545]
mongod(__wt_bt_read+0x203) [0x7f135e0f3a03]
mongod(__wt_page_in_func+0x1BE5) [0x7f135e0fc4e5]
mongod(__wt_row_search+0x7B4) [0x7f135e1247d4]
mongod(__wt_btcur_insert+0xDA0) [0x7f135e18ef30]
mongod(+0xB307A5) [0x7f135e13c7a5]
mongod(+0xAD82DA) [0x7f135e0e42da]
mongod(__wt_log_scan+0xCE4) [0x7f135e15f484]
mongod(__wt_txn_recover+0x6E3) [0x7f135e0e54b3]
mongod(__wt_connection_workers+0x37) [0x7f135e076e67]
mongod(wiredtiger_open+0x191C) [0x7f135e073b4c]
mongod(ZN5mongo18WiredTigerKVEngineC1ERKNSt7_cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_PNS_11ClockSourceES8_mbbbb+0x889) [0x7f135e046ff9]
mongod(+0xA1F5B4) [0x7f135e02b5b4]
mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x637) [0x7f135e21c3b7]
mongod(+0x924687) [0x7f135df30687]
mongod(ZN5mongo11mongoDbMainEiPPcS1+0x86C) [0x7f135e00a78c]
mongod(main+0x9) [0x7f135df98b59]
libc.so.6(__libc_start_main+0xED) [0x7f135bcb37ed]
mongod(+0x9EE101) [0x7f135dffa101]
----- END BACKTRACE -----



 Comments   
Comment by Danny Hatcher (Inactive) [ 28/Mar/19 ]

Hello Jasmeet,

Unfortunately we aren't able to repair this corruption.

However, we can collect information to guide our development efforts in the future. If you are able, can you please provide:

  1. The complete logs for the affected node, including before, leading up to, and after the first sign of corruption.
  2. A complete description of the underlying storage mechanism in use. Please address questions such as whether storage is locally attached or network-attached, whether disks are SSDs or HDDs, whether disks are SSDs or HDDs, whether RAID is in use and if so how it is configured, and what file system and/or volume management system is in use.
  3. A description of your backup method, if any.
  4. A history of the deployment, including:
    1. a timeline of version changes
    2. a timeline of hardware upgrade/downgrade cycles or configuration changes
    3. a timeline of disaster recovery or backup restoration activities
    4. a timeline of any manipulations of the underlying database files, including copies or moves, and information about whether mongod was running during each manipulation.
  5. Finally, can you provide assurances that you have not manipulated (copied or moved) the underlying database files while mongod was running, and that your disks have been recently checked for integrity?

Thanks,

Danny

Comment by Jasmeet Singh [ 28/Mar/19 ]

I copied the files provided in the dbpath and tried to start mongod, still no success.

PFA the log file.

repair-attempt.log

Comment by Danny Hatcher (Inactive) [ 26/Mar/19 ]

Hello Jasmeet,

Can you please download repair-attempt.tar, replace the files in your $dbpath, and attempt to start the node? If it does not work, please provide the logs of the result.

Thank you,

Danny

Comment by Jasmeet Singh [ 25/Mar/19 ]

I ran

./mongod --repair --dbpath <path_to_data_files>

from within the bin folder of the mongodb binaries.

The repair failed with the same error.

PFA the corresponding logs

binaries_repair.log.

Comment by Danny Hatcher (Inactive) [ 22/Mar/19 ]

Hello Jasmeet,

I see that you encountered:

2019-03-19T11:20:38.044+0530 E STORAGE  [conn447] WiredTiger error (0) [1552974638:44806][15679:0x7f09ad46c700], file:collection-2418-6931361946543157039.wt, WT_CURSOR.next: read checksum error for 12288B block at offset 59064320: block header checksum of 4063015638 doesn't match expected checksum of 3438854751

You mentioned that you have tried to run MongoDB with -repair and it did not succeed. Could you please download the 4.0.6 MongoDB binaries, use the 4.0.6 binary to run the -repair against your data files, and then start the mongod using your original binaries?

Thank you,

Danny

Comment by Jasmeet Singh [ 22/Mar/19 ]

mongod (copy).log

Comment by Danny Hatcher (Inactive) [ 20/Mar/19 ]

Hello Jasmeet,

Could you please provide the mongod.log covering the time before the initial crash?

Thank you,

Danny

Comment by Jasmeet Singh [ 19/Mar/19 ]

Operating system is Ubuntu 14.04

local HDD drive

Mongo install was fresh and had been running fine for more than a year.

Generated at Thu Feb 08 04:54:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.