[SERVER-31602] WiredTiger.wt: read checksum error. block header checksum doesn't match expected checksum Created: 17/Oct/17  Updated: 14/Aug/18  Resolved: 20/Oct/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.9
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Matthäus Zloch Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: docker, envc, rge, rps, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

MacOS 10.12.6
mongod 3.4.9


Attachments: File WiredTiger.turtle     File WiredTiger.turtle     File WiredTiger.wt     File WiredTiger.wt     Text File repair-SERVER-31602.log     File repair-SERVER-31602.tar.gz    
Operating System: OS X
Participants:

 Description   

Hello there, can someone help me? I've mounted my data volume into a docker container running a mongod and accidentally stopped the execution with CTRL-C. After that I could not restart my local mongod using the same data directory. I saw in some issues e.g here SERVER-31196(https://jira.mongodb.org/browse/SERVER-31196)) that the problem may be resolved by replacing some WiredTiger files. So I did. But unfortunately that didn't work out.

Could you tell me:

  • is a recovery possible somehow? Even now that I have replaced my original WiredTiger.wt and WiredTiger.turtle files?
  • what information do you need from me then?

Here is the stacktrace of my initial try using the `--repair` statement.

Thank you in advance. Matthaeus

mongod --dbpath /usr/local/var/mongodb/ --repair
2017-10-17T17:07:03.656+0200 I CONTROL  [initandlisten] MongoDB starting : pid=43731 port=27017 dbpath=/usr/local/var/mongodb/ 64-bit host=...
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] db version v3.4.9
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] git version: 876ebee8c7dd0e2d992f36a848ff4dc50ee6603e
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2l  25 May 2017
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] allocator: system
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] modules: none
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] build environment:
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten]     distarch: x86_64
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten]     target_arch: x86_64
2017-10-17T17:07:03.657+0200 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: "/usr/local/var/mongodb/" } }
2017-10-17T17:07:03.657+0200 W -        [initandlisten] Detected unclean shutdown - /usr/local/var/mongodb/mongod.lock is not empty.
2017-10-17T17:07:03.658+0200 I -        [initandlisten] Detected data files in /usr/local/var/mongodb/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-10-17T17:07:03.658+0200 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-10-17T17:07:03.658+0200 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2017-10-17T17:07:03.658+0200 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=3584M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-10-17T17:07:04.588+0200 E STORAGE  [initandlisten] WiredTiger error (0) [1508252824:588393][43731:0x7fffc76a43c0], file:WiredTiger.wt, WT_CURSOR.search_near: read checksum error for 24576B block at offset 57344: block header checksum of 1668246562 doesn't match expected checksum of 3206759375
2017-10-17T17:07:04.588+0200 E STORAGE  [initandlisten] WiredTiger error (0) [1508252824:588464][43731:0x7fffc76a43c0], file:WiredTiger.wt, WT_CURSOR.search_near: WiredTiger.wt: encountered an illegal file format or internal value
2017-10-17T17:07:04.588+0200 E STORAGE  [initandlisten] WiredTiger error (-31804) [1508252824:588480][43731:0x7fffc76a43c0], file:WiredTiger.wt, WT_CURSOR.search_near: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-10-17T17:07:04.588+0200 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-10-17T17:07:04.588+0200 I -        [initandlisten] 
 
***aborting after fassert() failure
 
 
2017-10-17T17:07:04.599+0200 F -        [initandlisten] Got signal: 6 (Abort trap: 6).



 Comments   
Comment by Kelsey Schubert [ 14/May/18 ]

Hi rjv94rnjn,

As I mentioned on SERVER-26855, so we can track this issue, would you please open a new SERVER ticket including the logs from the failed start up attempt?

Thank you,
Kelsey

Comment by Rajeev Ranjan [ 13/May/18 ]

Hi,
I am getting same issue...Pl help if possible WiredTiger.turtle WiredTiger.wt

Comment by Mark Agarunov [ 20/Oct/17 ]

Hello matthaeus,

Thanks for your response. I'm glad to hear that this fixed the issue and everything is working again. To prevent this type of problem in the future, we recommend implementing regular backups and/or replication to mitigate any issues related to unreliable storage layers or server failures.

Thanks,
Mark

Comment by Matthäus Zloch [ 20/Oct/17 ]

I didn't try that. Ahhh.. it seems like the mongod service could now start up again. I can access the database again! thank you! I did:

`mongod --dbpath /usr/local/var/mongodb`

Thank you for the quick response and the fixes!

Comment by Mark Agarunov [ 19/Oct/17 ]

Hello matthaeus,

Thank you for providing this information. After running with -repair, is the error the same when mongod is started without -repair?

Thanks,
Mark

Comment by Matthäus Zloch [ 18/Oct/17 ]

Hi Mark, thank you for the fixes.

I have extracted the files and

  • `cp -p Downloads/WiredTiger.* /usr/local/var/mongodb/.`
  • `chown matthaeus:admin /usr/local/var/mongodb/WiredTiger.wt /usr/local/var/mongodb/WiredTiger.turtle`
  • `mongod --dbpath /usr/local/var/mongodb --repair`

There was much output on the console, seems like mongod could resolve some issues. But there was another error which keeps mongod still from starting normally. I have added the console output as repair-SERVER-31602.log.

Is there something you can do? Regards, Matthäus

Answers to your questions:

1. local HDD
2. fine
3. no. 3.4.2 before the error. I've updated to 3.4.9 because I thought the --repair tool got better.
4. mongod was running in a container (with docker-compose up) when I stopped the process with CTRL-C.
5. no
6. none, since currently in dev-mode
7. clean

Comment by Mark Agarunov [ 17/Oct/17 ]

Hello matthaeus,

Thank you for providing these files. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark

Comment by Matthäus Zloch [ 17/Oct/17 ]

Hi Mark, thank you for the quick answer. I fortunately have made a backup before experimenting with the commands. I will add the two files of my original dataset to the issue. That would be great if your attempt of repair would work. Regards, Matthäus WiredTiger.turtle WiredTiger.wt

Comment by Mark Agarunov [ 17/Oct/17 ]

Hello matthaeus,

Thank you for the report. When replacing the WiredTiger.turtle and WiredTiger.wt files, which files did you use to replace them? These files are generally specific to the dataset, and cannot be used with a different dataset. If you provide these files, I can attempt a repair of the files, but it is unlikely to succeed if they are not from the same dataset being used.

Thanks,
Mark

Generated at Thu Feb 08 04:27:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.