[SERVER-57853] Corrupted files: calculated block checksum doesn't match expected checksum Created: 19/Jun/21  Updated: 15/Jul/21  Resolved: 15/Jul/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 4.4.6
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stefan Bohlin Assignee: Eric Sedor
Resolution: Done Votes: 0
Labels: ChecksumError, corrupt, repair
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     Text File mongod.log    
Operating System: ALL
Participants:

 Description   

We've run into a massive issues with our database running on Windows Server 2019. For unknown reasons the data files has become corrupt and I'm having major isssues trying to restore them.

What I've tried so far is to copy all the corrupt collection files one by one to a new mongodb instance and run the --repair command. Then I created a dump of the repaired collection, deleted the old corrupted collection on the original instance and restored it from the repaired dump.

This did seem to solve the issues I had with the checksum error, but one of the collections got corrupted again, so it leads me to believe something is very wrong with the other files as well and not just the collections.

Repairing the database was not a success and basically just killed the databse. Also tried to use the wiredtiger (wt) tool to salvage the data, but had no success in building the tool with snappy compression

What would be the best approach?

CheersĀ 

Stefan



 Comments   
Comment by Eric Sedor [ 15/Jul/21 ]

Hi stefanbohlin@gmail.com, I'm going to close this ticket. But we can revisit it if you are able to provide the information I've requested. Thanks!

Comment by Eric Sedor [ 01/Jul/21 ]

HI stefanbohlin@gmail.com, I wanted to check in to see if you saw my last comment. Are you able to provide this information?

Comment by Eric Sedor [ 22/Jun/21 ]

Hi stefanbohlin@gmail.com,

It sounds like you've already made a copy of the $dbpath directory to safeguard so that you can work off of the current $dbpath.

The ideal resolution is to perform a clean resync from an unaffected node.

It looks like the mongod.log you provided above is from the the first occurrence of this more recent corruption. Is that right?

Can you please also provide:

  • As much of syslog and dmesg content leading up to the first sign of corruption as possible.
  • the $dbpath/diagnostic.data directory (the contents are described here) from before the crash.
  • The logs of the repair operation.
  • The logs of an attempt to start mongod after the repair operation completed.

If attaching to this ticket is a concern, I've created an upload portal for you. Files here are only visible to MongoDB employees and are deleted after a time.

Thanks,
Eric

Generated at Thu Feb 08 05:42:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.