[SERVER-23532] WT Library Panic Created: 05/Apr/16  Updated: 02/May/18  Resolved: 22/Apr/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.9
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Lucas Assignee: Kelsey Schubert
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File mongo.log.2016-04-05T07-46-39.bz2     File mongo.log.2016-04-05T09-46-49.bz2     File mongo.log.2016-04-05T13-27-54    
Issue Links:
Duplicate
duplicates SERVER-19815 Improved mongod --repair option for W... Closed
Operating System: ALL
Steps To Reproduce:

Unexpected.

Participants:

 Description   

My database got "WT_PANIC: WiredTiger library panic" with this error:

2016-04-05T08:27:58.378-0500 E STORAGE [initandlisten] WiredTiger (-31802) [1459862878:378859][25768:0x7f7c72ec8bc0], file:collection-2-6214914092427755077.wt, session.checkpoint: file contains a corrupted WiredTigerCheckpoint.161919.alloc extent list, range 15758626944-15759745152 past end-of-file: WT_ERROR: non-specific WiredTiger error

2016-04-05T08:27:58.378-0500 E STORAGE [initandlisten] WiredTiger (-31804) [1459862878:378915][25768:0x7f7c72ec8bc0], file:collection-2-6214914092427755077.wt, session.checkpoint: the process must exit and restart: WT_PANIC: WiredTiger library panic

I tried to repair with --repair and the same thing happened.

What I need to do to repair or discard corrupted data to get my database online? This server is a replica but my database is very large and it is costly to replicate from start.

Log files includes:

mongo.log.2016-04-05T07-46-39.bz2 - first error (log with several days)
mongo.log.2016-04-05T09-46-49.bz2 - second error (clean initialization but without --reapair)
mongo.log.2016-04-05T13-27-54 - subsequent startup with --repair.

Obs.: You will see several time-consuming operations in the first log, but have been properly indexed. But I think this wasn't the problem here.



 Comments   
Comment by Lucas [ 24/Jun/16 ]

So what the other reviewer said "Doing so is quite likely to result in the error messages you have reported." is nothing related to what happened to this database, right?

Thanks!

Comment by Ramon Fernandez Marina [ 24/Jun/16 ]

Yes lucasoares, as long as the source MongoDB server is not running, one can copy the dbpath to a new server and start mongod with those files. When using 3.2 one can also use db.fsyncLock().

Comment by Lucas [ 24/Jun/16 ]

Sorry for taking so long to answer.

I know about that and I never did this in my databases. All files always belonged to the same server. The only thing I ever did was copy ALL files to a different server (replica sync by scp), and this the MongoDB says possible.

Comment by Kelsey Schubert [ 28/Apr/16 ]

Hi lucasoares,

I took another look at this ticket and wanted to clarify that MongoDB does not support replacing a file in a WiredTiger database with one from another database, even if the other database is part of the same replica set. Doing so is quite likely to result in the error messages you have reported.

Kind regards,
Thomas

Comment by Lucas [ 22/Apr/16 ]

Ok anonymous.user but to be clear, this isn't the first time this happened to me (in other replicaSet) so it is very likely this isn't something related to power or drive.

Btw, thanks!

Comment by Kelsey Schubert [ 22/Apr/16 ]

Hi lucasoares,

Thank you for taking the time to upload the corrupted files. Unfortunately, at this time, my best recommendation would be to execute an initial sync. We have scheduled improvements to the repair process in SERVER-19815, and the files you have uploaded may help us improve the recovery process in the future. Please feel free to vote for SERVER-19815 and watch it for updates.

As I've mentioned previously, data corruption is often the result the result of faulty disk drives or power failures, and determining the root cause is very difficult without a clear reproduction.

Kind regards,
Thomas

Comment by Lucas [ 08/Apr/16 ]

anonymous.user I think it's ok now..... Or something weird is going on..

Comment by Lucas [ 07/Apr/16 ]

Are you sure anonymous.user? This file has 49G, Has no chance to be a ban from you server? I will try again.

Comment by Kelsey Schubert [ 07/Apr/16 ]

Hi lucasoares,

We are just missing collection-2-6214914092427755077.wt, can you please reupload the file?

Thank you,
Thomas

Comment by Lucas [ 06/Apr/16 ]

anonymous.user It's done, ok?

Comment by Lucas [ 05/Apr/16 ]

Yes and not.. Corruption occurs in one server and in that case I managed to "fix" the data (scp from the replica). In other words these data are from the server that didn't have this problem but are the same data as happened the last time.

I was absent for a while then I could not continue to help with that issue, sorry for that. But now I'm here to try to solve this problem definitely. I will upload but they are large files and can take a while okay?

And excuse my English. Any misunderstanding you may ask.

Comment by Kelsey Schubert [ 05/Apr/16 ]

Hi lucasoares,

Thank you for opening this ticket. Were these data files ever part of the nodes that suffered data corruption in SERVER-21191 or SERVER-19293?

So we can investigate this issue, can you please upload the following files: collection-2-6214914092427755077.wt, collection-13--3613304274109051084.wt, WiredTiger.wt and WiredTiger.turtle? I have created a secure upload portal for you to use here. Files you upload to this portal will only be visible to MongoDB employees examining this issue.

It is possible that we may need to see additional files to diagnosis this issue after having a chance to examine the files I mentioned above. Can you copy your data files aside as we continue to investigate?

Kind regards,
Thomas

Generated at Thu Feb 08 04:03:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.