[SERVER-81835] potential hardware corruption, read checksum error: block header checksum doesn't match the expected checksum. Created: 04/Oct/23 Updated: 07/Nov/23 Resolved: 07/Nov/23 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 6.0.3, 6.0.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | 비 서 | Assignee: | Noopur Gupta |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
CentOS 7.9 |
||
| Operating System: | ALL |
| Participants: |
| Description |
|
A total of 12 servers are configured as a ReplicaSet to operate a Shard Cluster, with each server having three nodes grouped together. Over the past month, approximately five secondary nodes have encountered issues with the message "potential hardware corruption, read checksum error: block header checksum doesn't match the expected checksum." Attempts to resolve the problem using the repair command have been unsuccessful, and the issue has persisted. Ultimately, the only effective solution was to delete the data and perform a resynchronization. However, deleting the data and resyncing is not a practical solution due to the large data capacity of around 25TB. Determining the root cause of this issue has proven to be challenging. How can I resolve this issue? |
| Comments |
| Comment by Noopur Gupta [ 07/Nov/23 ] |
|
Closing this ticket since there is no activity. Feel free to open the ticket if the issue still persists. |
| Comment by Noopur Gupta [ 30/Oct/23 ] |
|
We still need additional information to diagnose the problem. If this is still an issue for you, would you please provide the above logs after following the steps for sync ? |
| Comment by Noopur Gupta [ 09/Oct/23 ] |
|
Hi, This error message leads us to suspect some form of physical corruption. Please make a complete copy of the database's $dbpath directory to safeguard so that you can work off of the current $dbpath. Since, this is a replica set, the ideal resolution is to perform a clean resync from an unaffected node. You can also try mongod --repair using the latest version of MongoDB. If the issue with {{--repair }}still persists after the above steps are performed, then please also provide:
Thanks Noopur |