[SERVER-53628] Corrupt wt file, checksum validation levels and integrity check Created: 07/Jan/21 Updated: 22/Jun/22 Resolved: 10/Feb/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | GridFS, WiredTiger |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Question | Priority: | Major - P3 |
| Reporter: | Tom Decsi | Assignee: | Edwin Zhou |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Participants: |
| Description |
|
I am facing some issues getting mongoDB running stable again after a disk failure and I'am hoping to get some conclusive answers by posting this here. We are running the following:
The following happened:
Of course, this is not acceptable; this might be a ticking time bomb as other data blocks might be corrupted and mongoDB would exit whenever they are accessed. So a couple of questions I hope to find an answer to:
|
| Comments |
| Comment by Tom Decsi [ 04/Feb/22 ] |
|
Hi Brajmohan, You can run db.collection.validate script to identify the corrupt indexes. We eventually dropped all corrupt collections as we could not afford downtime or performance impact. Data loss of in our case older data was accepted. Mongo running fine afterwards ... Probably not the solution you were hoping for ... Rgds Tom |
| Comment by Brajmohan Sharma [ 04/Feb/22 ] |
|
Hi Tom Decsi, We are facing the same issue. How you analyzing the *.wt file to check corrupt indexes names. How you made instance up for Deleting and rebuilding them. We are unable to start the mongodb service. Many Thanks Braj Mohan |
| Comment by Edwin Zhou [ 10/Feb/21 ] |
|
Unfortunately we have no implementation available to run --repair on only select collections. To avoid a problem like this in the future, it is our strong recommendation to:
Best regards, |
| Comment by Tom Decsi [ 10/Feb/21 ] |
|
Hi Edwin, Thanks for your reply. Yes, we are able to run the db.collection.validate script and several collections were identified as being corrupted. Pls note that this is not preventing Mongo from starting up. Mongo starts fine. But whenever data is accessed in on of those collections, Mongo will exit due to WiredTiger panic error, as we have seen before. {{mongod --repair }}is an option, but this would take a whole week according to our estimates. Do you know if we can execute the repair only on corrupted collections instead of the whole database? Or any other options we may consider (besides just dropping the corrupted ones)? |
| Comment by Edwin Zhou [ 08/Feb/21 ] |
|
We'd love to hear back from you about your disk corruption! Were you able to try running db.collection.validate on the affected collections? After validating collections, I recommend trying mongod --repair. This may remove some documents, but it should eliminate any corruption that prevents mongod from starting up. Thanks, |
| Comment by Edwin Zhou [ 21/Jan/21 ] |
|
MongoDB 3.4 reached end of life in January of 2020. But we can provide limited guidance on this issue. As you've identified, this appears to be a disk corruption. First, make a complete copy of the database's $dbpath directory to safeguard so that you can work off of the current $dbpath. The best way to look for corruption is to run db.collection.validate on the affected collections. Index corruption can be solved by reindexing, which you've mentioned you've done in your steps. After validating collections, I recommend trying mongod --repair. This may remove some documents, but it should eliminate any corruption that prevents mongod from starting up. Best, Edwin |