[SERVER-32152] read checksum error Wiredtiger.wt Created: 02/Dec/17  Updated: 27/Jul/18  Resolved: 07/Dec/17

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.4.1
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Michael D Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: envm, rge, rpu, trct, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     Text File rep2.log     File repair-SERVER-32152.tar.gz    
Operating System: ALL
Participants:

 Description   

Hi,

I had a server crash and since then I am unable to restart my mongod service with the error:

2017-12-02T10:35:21.780+0100 E STORAGE  [initandlisten] WiredTiger error (0) [1512207321:780192][6685:0x7f45f2800d40], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 655
36: block header checksum of 1952542066 doesn't match expected checksum of 1991159358
2017-12-02T10:35:21.780+0100 E STORAGE  [initandlisten] WiredTiger error (0) [1512207321:780241][6685:0x7f45f2800d40], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format
 or internal value
2017-12-02T10:35:21.780+0100 E STORAGE  [initandlisten] WiredTiger error (-31804) [1512207321:780253][6685:0x7f45f2800d40], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC:
 WiredTiger library panic
2017-12-02T10:35:21.780+0100 I -        [initandlisten] Fatal Assertion 28558 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 361
2017-12-02T10:35:21.780+0100 I -        [initandlisten]
 
***aborting after fassert() failure

What is weird, that Wiredtiger.turtle doesn't have a readable format like my other servers.

Thanks,
Michael



 Comments   
Comment by Mark Agarunov [ 07/Dec/17 ]

Hello mmdbs18,

Unfortunately, this error indicates that there was corruption on the disk. In this situation, my best recommendation would be to resync the affected node or restore from a backup if possible.

Thanks,
Mark

Comment by Michael D [ 07/Dec/17 ]

Hi Mark,

Thanks for the files. I replaced the files and it didn't work. I attached the full log.

The answers to your questions:

1) It is a RAID 10 SSD volume for a VM
2) All looks good
3) Yes
4) No
5) No
6) As this is still a pre prod system we have only started building up the replication nodes to then take backups from those.
7) A week before the crash. The system ran out of memory. After a restart we had problems starting the mongod process.

Thanks,
Michael

Comment by Mark Agarunov [ 04/Dec/17 ]

Hello mmdbs18,

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark\

Generated at Thu Feb 08 04:29:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.