[SERVER-30942] WiredTiger error (0) [1504527896:382615][91756:0x7f6ef4ecfd40], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 53248: block header checksum of 3729987182 doesn't match expected checksum of 2115544974 Created: 04/Sep/17  Updated: 27/Jul/18  Resolved: 14/Sep/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: yanglibo Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: envns, rpns, rpo, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repair_attempt.tar.gz    
Operating System: ALL
Steps To Reproduce:

2017-09-04T20:24:56.326+0800 I CONTROL [initandlisten] options: { repair: true, storage:

{ dbPath: "/share/MongoData/", engine: "wiredTiger" }

}
2017-09-04T20:24:56.364+0800 I STORAGE [initandlisten] Detected WT journal files. Running recovery from last checkpoint.
2017-09-04T20:24:56.364+0800 I STORAGE [initandlisten] journal to nojournal transition config: create,cache_size=7406M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-09-04T20:24:56.382+0800 E STORAGE [initandlisten] WiredTiger error (0) [1504527896:382615][91756:0x7f6ef4ecfd40], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 53248: block header checksum of 3729987182 doesn't match expected checksum of 2115544974
2017-09-04T20:24:56.382+0800 E STORAGE [initandlisten] WiredTiger error (0) [1504527896:382670][91756:0x7f6ef4ecfd40], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2017-09-04T20:24:56.382+0800 E STORAGE [initandlisten] WiredTiger error (-31804) [1504527896:382688][91756:0x7f6ef4ecfd40], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2017-09-04T20:24:56.382+0800 I - [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-09-04T20:24:56.382+0800 I - [initandlisten]

***aborting after fassert() failure

Participants:

 Description   

My sever lost power when I was backuping mongodb files,then I can't start mongo services,and I use the --repair order to repair it, this error message show.my mongodb version is v3.4.3 64bit, and I tried to replace the 'WiredTiger.turtle' and ‘WiredTiger.wt’ which you give in other similar issue like https://jira.mongodb.org/browse/SERVER-18448, it didn't work.Can you help me fix this problem?



 Comments   
Comment by Kelsey Schubert [ 14/Sep/17 ]

Hi libo,

Thank you for providing the files and answering my questions. I've attached a repair attempt of the files you've provided. Please extract these files and replace them in you $dbpath. From your answers, this issue is likely external to MongoDB. During power failure, NAS may corrupt files that are in the process of being written. Since WiredTiger.wt is most frequently written to of any MongoDB file, it is the most likely to suffer from this type of issue.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag. A question like this involving more discussion would be best posted on the mongodb-users group.

Kind regards,
Kelsey

Comment by yanglibo [ 05/Sep/17 ]

Hi Thomas Kelsey Schubert,

To answer your question,
1.the storage devices are attached over network
2.The disk is HDD.
3.we use RAID, I can only know the filesystem is nfs4, as for the RAID,I can only get this message below:
dracut: rd_NO_DM: removing DM RAID activation
dracut: rd_NO_MD: removing MD RAID activation
megaraid_sas 0000:03:00.0: PCI INT A -> GSI 26 (level, low) -> IRQ 26
megaraid_sas 0000:03:00.0: setting latency timer to 64
megaraid_sas 0000:03:00.0: FW now in Ready state
megaraid_sas 0000:03:00.0: irq 91 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 92 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 93 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 94 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 95 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 96 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 97 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 98 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 99 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 100 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 101 for MSI/MSI-X
megaraid_sas 0000:03:00.0: irq 102 for MSI/MSI-X
megaraid_sas 0000:03:00.0: firmware supports msix : (96)
megaraid_sas 0000:03:00.0: current msix/online cpus : (12/12)
megaraid_sas 0000:03:00.0: RDPQ mode : (disabled)
megaraid_sas 0000:03:00.0: Current firmware maximum commands: 928 LDIO threshold: 0
megaraid_sas 0000:03:00.0: FW supports sync cache : No
megaraid_sas 0000:03:00.0: Init cmd success
megaraid_sas 0000:03:00.0: firmware type : Extended VD(240 VD)firmware
megaraid_sas 0000:03:00.0: controller type : MR(1024MB)
megaraid_sas 0000:03:00.0: Online Controller Reset(OCR) : Enabled
megaraid_sas 0000:03:00.0: Secure JBOD support : No
megaraid_sas 0000:03:00.0: INIT adapter done
megaraid_sas 0000:03:00.0: Jbod map is not supported megasas_setup_jbod_map 5034
megaraid_sas 0000:03:00.0: pci id : (0x1000)/(0x005d)/(0x1028)/(0x1f49)
megaraid_sas 0000:03:00.0: unevenspan support : yes
megaraid_sas 0000:03:00.0: firmware crash dump : no
megaraid_sas 0000:03:00.0: jbod sync map : no
scsi0 : Avago SAS based MegaRAID driver

Don't know whether it is usefull.

4.we didn't lose any other files after the power off, so I think the the integrity of your disks is fine?

and I found there is a lot of similar issues were posted , could you please give some suggestions in the configuration of mongo to avoid this kind of problen

Thank you,
libo

Comment by Kelsey Schubert [ 05/Sep/17 ]

Hi libo,

For us to attempt a repair the of the WiredTiger.wt file, we would need the original WiredTiger.wt and WiredTiger.turtle files. Would you please upload them?

Additionally, I have few questions about your storage layer:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?

Thank you,
Kelsey

Generated at Thu Feb 08 04:25:31 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.