[SERVER-36179] Cannot start because "WT_CURSOR.insert: read checksum error" Created: 18/Jul/18  Updated: 13/Aug/18  Resolved: 18/Jul/18

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.16
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Simon Georges Assignee: Nick Brewer
Resolution: Done Votes: 0
Labels: envns, rfi, rpu, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File repair-attempt.tar.gz    
Operating System: Linux
Participants:

 Description   

I can't start my mongo instance after restoring a snapshot.

Journaling is enabled but I guess some file are corrupted.

2018-07-18T10:22:38.052-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-07-18T10:22:38.052-0400 I STORAGE  [initandlisten]
2018-07-18T10:22:38.052-0400 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-07-18T10:22:38.052-0400 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-07-18T10:22:38.052-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=11462M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
2018-07-18T10:22:38.300-0400 E STORAGE  [initandlisten] WiredTiger error (0) [1531923758:300215][1279:0x7fb34a5b4e40], file:WiredTiger.wt, WT_CURSOR.insert: read checksum error for 12288B block at offset 90112: block header checksum of 3360834145 doesn't match expected checksum of 713189965
2018-07-18T10:22:38.300-0400 E STORAGE  [initandlisten] WiredTiger error (0) [1531923758:300325][1279:0x7fb34a5b4e40], file:WiredTiger.wt, WT_CURSOR.insert: WiredTiger.wt: encountered an illegal file format or internal value
2018-07-18T10:22:38.300-0400 E STORAGE  [initandlisten] WiredTiger error (-31804) [1531923758:300354][1279:0x7fb34a5b4e40], file:WiredTiger.wt, WT_CURSOR.insert: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-07-18T10:22:38.300-0400 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 365
2018-07-18T10:22:38.300-0400 I -        [initandlisten]

 

it looks a lot like SERVER-35104

Is there any way to repaire WiredTiger.wt and WiredTiger.turtle to discard any pending changes ?

 

I can not find anything in the documentation.

 

Thank you

 



 Comments   
Comment by Nick Brewer [ 18/Jul/18 ]

simon.georges@cit-direct.com

Unfortunately, this error indicates that there was corruption on the disk, most often caused by a faulty storage layer. In this situation, our best recommendation would be to resync the affected node if it is a member of a replica set, or restore from a backup if possible.

To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures:

Regards,
Nick

Comment by Simon Georges [ 18/Jul/18 ]

THanks a lot I tried that and I received a new error :

2018-07-18T11:59:58.805-0400 I -        [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-07-18T11:59:58.806-0400 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-07-18T11:59:58.806-0400 I STORAGE  [initandlisten]
2018-07-18T11:59:58.806-0400 I STORAGE  [initandlisten] ** WARNING: Using the XFS filesystem is strongly recommended with the WiredTiger storage engine
2018-07-18T11:59:58.806-0400 I STORAGE  [initandlisten] **          See http://dochub.mongodb.org/core/prodnotes-filesystem
2018-07-18T11:59:58.806-0400 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=11462M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),verbose=(recovery_progress),
2018-07-18T12:00:00.302-0400 I STORAGE  [initandlisten] WiredTiger message [1531929600:302240][1918:0x7f10604fae40], txn-recover: Main recovery loop: starting at 22981/75344768
2018-07-18T12:00:00.303-0400 I STORAGE  [initandlisten] WiredTiger message [1531929600:303678][1918:0x7f10604fae40], txn-recover: Recovering log 22981 through 22988
2018-07-18T12:00:00.569-0400 E STORAGE  [initandlisten] WiredTiger error (-31802) [1531929600:569796][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: unable to read root page from file:index-0--5517169169735138620.wt: WT_ERROR: non-specific WiredTiger error
2018-07-18T12:00:00.570-0400 E STORAGE  [initandlisten] WiredTiger error (-31802) [1531929600:570555][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: operation apply failed during recovery: operation type 4 at LSN 22981/75344768: WT_ERROR: non-specific WiredTiger error
2018-07-18T12:00:00.570-0400 E STORAGE  [initandlisten] WiredTiger error (0) [1531929600:570593][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: WiredTiger is unable to read the recovery log.
2018-07-18T12:00:00.570-0400 E STORAGE  [initandlisten] WiredTiger error (0) [1531929600:570619][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: This may be due to the log files being encrypted, being from an older version or due to corruption on disk
2018-07-18T12:00:00.570-0400 E STORAGE  [initandlisten] WiredTiger error (0) [1531929600:570649][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: You should confirm that you have opened the database with the correct options including all encryption and compression options
2018-07-18T12:00:00.570-0400 E STORAGE  [initandlisten] WiredTiger error (-31802) [1531929600:570685][1918:0x7f10604fae40], file:index-0--5517169169735138620.wt, txn-recover: Recovery failed: WT_ERROR: non-specific WiredTiger error
2018-07-18T12:00:00.615-0400 I -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 277
2018-07-18T12:00:00.640-0400 I STORAGE  [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-07-18T12:00:00.641-0400 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-07-18T12:00:00.641-0400 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27017.sock
2018-07-18T12:00:00.641-0400 I NETWORK  [initandlisten] shutdown: going to flush diaglog...
2018-07-18T12:00:00.641-0400 I CONTROL  [initandlisten] now exiting
2018-07-18T12:00:00.641-0400 I CONTROL  [initandlisten] shutting down with code:100

Comment by Nick Brewer [ 18/Jul/18 ]

simon.georges@cit-direct.com

I've attached the files after a repair attempt. Would you please extract these files, substitute them for the current ones in your $dbpath, and let us know if it resolves the issue?

Thanks,
Nick

repair-attempt.tar.gz

 

Comment by Simon Georges [ 18/Jul/18 ]

Environment

OS : Centos 7.5

MongoDB : 3.4.16

 

 

Generated at Thu Feb 08 04:42:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.