[SERVER-34972] Unable to start standalone MongoDB instance due to checksum mismatch in WiredTiger storage engine Created: 13/May/18  Updated: 10/Jun/18  Resolved: 18/May/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.0
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Snehadeep [X] Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File _mdb_catalog.wt     File repair_attempt.tar.gz     File sizeStorer.wt    
Issue Links:
Duplicate
is duplicated by SERVER-34987 Unable to start standalone MongoDB in... Closed
Operating System: ALL
Participants:

 Description   

I have a standalone MongoDB instance. It was working fine till today. When I tried to restart the DB, I am getting error saying there is a checksum mismatch in the WiredTiger.wt file. I dont know the exact reason as to why the data got corrupted. 

 I tried to repair the DB but to no avail. I am getting the same error. I also tried the wiredtiger tool at http://source.wiredtiger.com/3.0.0/command_line.html. But it is also giving the same error.

It seems there are a lot of such corruption issues and it was being said that the issue would not occur post 3.2 version. I am having 3.4.0 version and still getting the issue.

I am attaching the required files and any help to recover the data would be deeply appreciated.

Please feel free to ask me anything else that would be required.

Please find below the errors (IT IS GIVING DIFFERENT CHECKSUMS AT DIFFERENT TIME OF RESTART EVEN THOUGH NOTHING IN THE DB HAS BEEN CHANGED) -

1ST TIME OF RESTART

2018-05-13T17:03:39.809+0530 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=3313M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-05-13T17:03:39.830+0530 E STORAGE [initandlisten] WiredTiger error (0) [1526211219:830804][593:0x7f97fc28dc80], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 24576: block header checksum of 1793761607 doesn't match expected checksum of 3070115156
2018-05-13T17:03:39.830+0530 E STORAGE [initandlisten] WiredTiger error (0) [1526211219:830842][593:0x7f97fc28dc80], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2018-05-13T17:03:39.830+0530 E STORAGE [initandlisten] WiredTiger error (-31804) [1526211219:830860][593:0x7f97fc28dc80], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-05-13T17:03:39.830+0530 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2018-05-13T17:03:39.830+0530 I - [initandlisten]

***aborting after fassert() failure

2018-05-13T17:03:39.850+0530 F - [initandlisten] Got signal: 6 (Aborted).

 

2ND TIME OF RESTART

2018-05-13T23:03:29.765+0530 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=3313M,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2018-05-13T23:03:29.785+0530 E STORAGE [initandlisten] WiredTiger error (0) [1526232809:785143][20203:0x7f32ca8fbc80], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 24576: block header checksum of 1793761607 doesn't match expected checksum of 1666091971
2018-05-13T23:03:29.785+0530 E STORAGE [initandlisten] WiredTiger error (0) [1526232809:785225][20203:0x7f32ca8fbc80], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2018-05-13T23:03:29.785+0530 E STORAGE [initandlisten] WiredTiger error (-31804) [1526232809:785278][20203:0x7f32ca8fbc80], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2018-05-13T23:03:29.785+0530 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2018-05-13T23:03:29.785+0530 I - [initandlisten]

***aborting after fassert() failure

2018-05-13T23:03:29.806+0530 F - [initandlisten] Got signal: 6 (Aborted).

 

 

 



 Comments   
Comment by Kelsey Schubert [ 18/May/18 ]

Hi Vikram,

Unfortunately, this behavior likely indicates that there was corruption on the disk, most often caused by a faulty storage layer. In this situation, our best recommendation would be to resync the affected node or restore from a backup if possible. If you have a copy of the original files prior to any attempted salvage operations, I would suggest starting mongod with the repair attempt against these files.

To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.

Thank you,
Kelsey

Comment by Snehadeep [X] [ 18/May/18 ]

Hi Kelsey,

We are really pressed for time. Can you please provide solution? We have tried many suggestions online and nothing is working.

Comment by Snehadeep [X] [ 16/May/18 ]

Hi Kelsey,

I tried using the wt tool to recover the data. But, it is showing the count correctly but not the records. If I try to find the records, its showing empty. I tried the repair command also. But not able to recover the records.

Can you please provide any other suggestion as to how to recover the records?

Comment by Snehadeep [X] [ 15/May/18 ]

Hi Kelsey,

Thanks. I was able to start the MongoDB with the repair_attempt files. But when I see the recovered data, there is not even 5% of the data that is recovered. Before this issue, we had around 1.5 lac records and after recovery, we have only 127 records.

I will try running the wt command line tool. But if there is any other way to recover the records, please let me know. Any help would be appreciated.

Comment by Kelsey Schubert [ 14/May/18 ]

Hi Vikram,

Thank you for your report. I've attached a repair attempt, repair_attempt.tar.gz, of the files you provided. Please extract these files and replace them in your $dbpath and let us know if it resolves the issue. If you are still seeing errors after replacing these files, please provide the complete logs from the affected node so that we can further investigate.

Thank you,
Kelsey

Generated at Thu Feb 08 04:38:25 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.