[SERVER-64629] WiredTiger metadata corruption detected - unable to repair Created: 18/Mar/22  Updated: 04/May/22  Resolved: 04/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sarojini Jillalla Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Repair - 3.6.18.txt     Text File Repair - 3.6.23.txt     Text File Repair - 4.4.13.txt     Text File Repair - 4.4.3.txt     HTML File WiredTiger     File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt    
Operating System: ALL
Participants:

 Description   

Hi all,

We are using Graylog, Elasticsearch and MongoDB for logging and archiving. These apps are run as docker containers with 3 replicas on 3 RHEL servers. We are using MongoDB version 3.6.18
Generally, the docker containers goes down and they are automatically brought up by the docker daemon. But sometimes, the shutdown is not proper and the data in MongoDB gets corrupted. Till now, we used to perform a repair and the data was able to be recovered successfully.

I had previously created a ticket for the similar issue (https://jira.mongodb.org/browse/SERVER-61936). At that time, I had restored the data from the backup and could not try the repair command as mentioned by "Edwin Zhou".

I have encountered the problem again. The same DB got corrupted.
I have tried the repair operations with 4 different versions of MongoDB. All the log outputs from the repair operations are attached below. But none of the repair was successful.

I have attached the WiredTiger files from this corrupted db.

Can you please help me in recovering from this failure?
Also can you let me know how I can prevent this frequent corruption of Data in the DB?

Thanks,
Saroj



 Comments   
Comment by Edwin Zhou [ 04/May/22 ]

Hi saroj.jillalla@cgi.com,

Thank you for following up. We're happy to hear that you were able to resolve the problem by restoring the data from backup.

Since you are operating with a replica set, the ideal resolution is to perform a clean resync from an unaffected node.

To avoid a problem like this in the future, it is our strong recommendation to:

Best,
Edwin

Comment by Sarojini Jillalla [ 28/Apr/22 ]

Hi Edwin/Louis,
Sorry for the delayed update.
We had to restore the data from a backup and the issue was resolved. I did not try the repair operation using v4.0.28.

Comment by Edwin Zhou [ 24/Mar/22 ]

Hi saroj.jillalla@cgi.com,

To add on to louis.williams, please make a complete copy of the database's $dbpath directory prior to running any sort of repair operation to safeguard so that you can work off of the current $dbpath.

We look forward to hearing about the status of the repair operation on MongoDB 4.0.28.

Best,
Edwin

Comment by Louis Williams [ 21/Mar/22 ]

saroj.jillalla@cgi.com, sorry to hear about this issue. Have you tried repairing with version 4.0.28? If not, can you post the result of that operation?

Generated at Thu Feb 08 06:00:46 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.