[SERVER-30271] file:WiredTiger.wt, WT_CURSOR.next: read checksum error Created: 22/Jul/17  Updated: 27/Jul/18  Resolved: 08/Aug/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: zhaow Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: envh, rpo, rps, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive WiredTiger.zip     File repair_attempt.tar.gz    
Operating System: Linux
Participants:

 Description   

After power failure, 2 MongoDB Servers just won't start again.
With one I get: :

WiredTiger error (0) [1500732904:16506][17102:0x7fc3ddb5bdc0], file:WiredTiger.wt, WT_CURSOR.next: read checksum error for 12288B block at offset 2478080: block header checksum of 405166289 doesn't match expected checksum of 439830749
2017-07-22T22:15:04.016+0800 E STORAGE [initandlisten] WiredTiger error (0) [1500732904:16579][17102:0x7fc3ddb5bdc0], file:WiredTiger.wt, WT_CURSOR.next: WiredTiger.wt: encountered an illegal file format or internal value
2017-07-22T22:15:04.016+0800 E STORAGE [initandlisten] WiredTiger error (-31804) [1500732904:16597][17102:0x7fc3ddb5bdc0], file:WiredTiger.wt, WT_CURSOR.next: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-07-22T22:15:04.016+0800 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361

Would you be willing to try repairing our .wt files for both our servers separated by 2 set of files that i've attached
Thanks



 Comments   
Comment by Kelsey Schubert [ 08/Aug/17 ]

Thanks for the additional information, from your responses this issue appears to be the result of a disk corruption outside of MongoDB following the power failure.

Comment by zhaow [ 26/Jul/17 ]

Oh,ye,it worked.
Just have some error for the collection.but I think I can resolve it.

Answer:
1.use xfs filesystem.
2.storage devices is locallly.
3.use HDDS
4.use raid 0
5.The filesystem check fail,say to replay the log.
But when I mount the device.It says the structure needs cleaning .
So i have no idea,then use the xfs_repair with -L.

Comment by Kelsey Schubert [ 24/Jul/17 ]

Hi zhaow7,

Thank you for the report. I only see one set of files, and I've attached a repair attempt of them. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate.

Additionally, would you please answer the following questions:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Was the underlying filesystem checked after the power failure and is it currently marked clean?

Thank you,
Thomas

Generated at Thu Feb 08 04:23:14 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.