[SERVER-47617] On Sharding, WiredTiger error : calculated block checksum doesn't match expected checksum Created: 17/Apr/20  Updated: 23/Apr/20  Resolved: 23/Apr/20

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 4.2.0, 4.2.5
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: jongsun moon Assignee: Carl Champain (Inactive)
Resolution: Done Votes: 0
Labels: repair, repairDatabase
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File shard_us_1.log     File shard_us_1.log.2020-04-01T02-14-46    
Operating System: ALL
Steps To Reproduce:

 2020-04-16T17:17:52.001+0900 I SHARDING [migrateThread] Finished deleting IPOFFICE_200201_US.TBL_FEA_CRH_KW_TECHCATE range [\{ _id: ObjectId('5e90bdabaf8f0460a6c7cfb7') }, \{ _id: ObjectId('5e90bdac0fa88058daa0deb6') })
2020-04-16T17:17:52.320+0900 E STORAGE [chunkInserter] WiredTiger error (0) [1587025072:320766][3314:0x7f90a0a49700], file:IPOFFICE_200201_US/index-229--3288326397748746074.wt, WT_CURSOR.insert: __wt_block_read_off, 274: IPOFFICE_200201_US/index-229--3288326397748746074.wt: read checksum error for 12288B block at offset 150601728: calculated block checksum doesn't match expected checksum Raw: [1587025072:320766][3314:0x7f90a0a49700], file:IPOFFICE_200201_US/index-229--3288326397748746074.wt, WT_CURSOR.insert: __wt_block_read_off, 274: IPOFFICE_200201_US/index-229--3288326397748746074.wt: read checksum error for 12288B block at offset 150601728: calculated block checksum doesn't match expected checksum

Participants:

 Description   

For the service, we make database at once.

The build database process is
1. insert data for 140 collections
2. build 350 index
3. add shard 140 collections.

When shard collections, we met log message below.

We used mongoDB 4.2.0 and 4.2.5. and got simular error message.

Servers are one mongos, three mongo config servers for one replication set and three shard servers each for the first replication set.

Could you give us an advice what we should check for it.



 Comments   
Comment by Carl Champain (Inactive) [ 23/Apr/20 ]

I'm sorry to hear that --repair didn't work! I'm going to close this ticket now.

Kind regards,
Carl
 

Comment by jongsun moon [ 23/Apr/20 ]

Oh... I see.

We ran --repair on the server that contains corrupted data. But it gave us same error message.

Anyway, as you pointed out, it seems to be happened because of physical problem.  

With new physical server, it works well until now.

Thank you for your help.

Comment by Carl Champain (Inactive) [ 22/Apr/20 ]

Then you should run --repair on the server that contains corrupted data.

Comment by jongsun moon [ 22/Apr/20 ]

We have three shard cluster.

Each shard cluster has only one server as primary server of replication set.(no slave server.)

Comment by Carl Champain (Inactive) [ 21/Apr/20 ]

Are the shard servers part of the same replica set? If so, we recommend a clean resync from an unaffected node.

Comment by jongsun moon [ 21/Apr/20 ]

Hi Carl Champain,

Thank you for your kindly comment.

I have a question.
I have three shard server. And I should have run "--repair" all shard server?

Below is what we did.
1. run "-repair" on the server which had problem.  >  It did not work.
2. Use new physical server and run "-repair" >  It did not work. 
   : we copied $dbpath directory to the new physical server, then started mongod after changing shard server ip and replication set ip as the new physical server ip.
3. With new physical server, now we have done building database processs newly. -> in progress.

I will let you know the #3 result.

Regards,

Comment by Carl Champain (Inactive) [ 20/Apr/20 ]

Hi jjongei0@gmail.com,

This error message leads us to suspect some form of physical corruption. Please make a complete copy of the database's $dbpath directory to safeguard so that you can work off of the current $dbpath.

The ideal resolution is to perform a clean resync from an unaffected node.

You can also try mongod --repair using the latest version of MongoDB.

In the event that a --repair operation is unsuccessful, then please also provide:

  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Kind regards,
Carl

Generated at Thu Feb 08 05:14:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.