[SERVER-34392] file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value Created: 09/Apr/18  Updated: 14/Aug/18  Resolved: 02/May/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.19
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Frank Perez Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: docker, envc, rpo, rps, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     Text File logfile_while_running.txt     Text File output_of_repair_command.txt     File repair_attempt.tar.gz    
Operating System: Linux
Participants:

 Description   

Hi,

We had a major power outage and it knocked down our mongo db running in docker as part of a stack. Now it won't come back up. Mongo had it's own volume mounted to /data/db.

At first I took the stack down created and ran a temporary container mounted to the same volume to see if I could repair the data. I ran the "mongod --repair --dbpath /data/db" and nothing, It was still saying "file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 12288: block header checksum of 1843462872 doesn't match expected checksum of 1205067277".

I also ran this command "mongod --repair --dbpath /data/db --storageEngine wiredTiger"...same thing.

I then went as far as to restore the data from a backup that we had done the day prior of the volume used to store the mongo data. When I deployed the stack again (fresh containers, etc.) and I check the logs that the server is outputting...still the same issue.

Please advise/help as I thought that restoring from the backup if anything would of fixed it but still nothing. I'm attaching the stdout of the repair command as well as the regular log of the server while running (or trying to anyways). I just don't understand why the backup restore isn't working as the backup was done before the outage therefor it shouldn't be corrupted.

Please help me recover the db as it's very important. Much appreciation and thanks.
Sincerely,
Frank Perez



 Comments   
Comment by Frank Perez [ 02/May/18 ]

Thank you Kelsey for the update!

Comment by Kelsey Schubert [ 02/May/18 ]

HI frankp,

We have improvements in this space currently scheduled, see SERVER-19815 as an example.

Additionally we suggest the following to help mitigate any issues related to unreliable storage layers or server failures:

Kind regards,
Kelsey

Comment by Frank Perez [ 10/Apr/18 ]

Kelsey,

Thank you very much for the repaired files, it worked!!!
I was wondering if you can just shed some light as to why restoring from a backup that was made prior to the power outage (the data should of not been corrupt) didn't work....and if there is anything on our end that we could do (way we do backups, etc.) better/different to prevent this moving forward? Lastly, we are currently running the 3.2 mongo docker image, has this issue been addressed in newer versions that updating to that version would help? Ultimately how can we prevent this from happening in the future?

Thanks again for the help and any light you can shed on the questions above would be greatly appreciated.

Sincerely,
Frank Perez

Comment by Kelsey Schubert [ 09/Apr/18 ]

Hi frankp,

I've attached a repair attempt, repair_attempt.tar.gz, of the files you provided. Please extract these files and replace them in your $dbpath and let us know if it resolves the issue. If you are still seeing errors after replacing these files, please provide the complete logs from the affected node so that we can further investigate.

Thank you,
Kelsey

Comment by Frank Perez [ 09/Apr/18 ]

files have been uploaded.

Comment by Frank Perez [ 09/Apr/18 ]

ok. will do. thanks

Comment by Kelsey Schubert [ 09/Apr/18 ]

Hi frankp,

Would you please upload the wiredtiger.wt and wiredtiger.turtle files so we can attempt a repair?

Thank you,
Kelsey

Comment by Frank Perez [ 09/Apr/18 ]

P.S I'm running the mongo:3.2 docker image.

Generated at Thu Feb 08 04:36:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.