[SERVER-50788] mongod can not start : file:WiredTiger.wt, connection: read checksum error Created: 08/Sep/20 Updated: 11/Sep/20 Resolved: 10/Sep/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.9 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | lxh lxh | Assignee: | Dmitry Agranat |
| Resolution: | Done | Votes: | 0 |
| Labels: | FA_28558 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Environment: |
x86_64 |
||
| Attachments: |
|
| Participants: |
| Description |
|
2020-09-08T14:25:38.319+0800 I CONTROL [initandlisten] MongoDB starting : pid=25998 port=27018 dbpath=/data/db 64-bit host=server }, processManagement: { fork: true }, repair: true, replication: { enableMajorityReadConcern: true, oplogSizeMB: 20480, replSetName: "rs3" }, security: { authorization: "enabled", clusterAuthMode: "x509", keyFile: "/usr/local/mongodb/authentication/keyFile" }, sharding: { archiveMovedChunks: false, clusterRole: "shardsvr" }, storage: { dbPath: "/data/db", directoryPerDB: true, engine: "wiredTiger", journal: { enabled: false } }, systemLog: { destination: "file", logAppend: true, logRotate: "rename", path: "/data/log/mongodb.log", verbosity: 0 } } ***aborting after fassert() failure |
| Comments |
| Comment by lxh lxh [ 11/Sep/20 ] | |
|
I got your suggestion. Anyway, I really appreciate your timely help ! | |
| Comment by Dmitry Agranat [ 10/Sep/20 ] | |
|
The error message you are receiving indicates that there is an additional corruption. Unfortunately, we do not have any automated process to recover data from this situation. To avoid a problem like this in the future, it is our strong recommendation to:
Regards, | |
| Comment by lxh lxh [ 10/Sep/20 ] | |
|
Thank you for your help ! It did not work after replacing the two files, but the error message changed:
2020-09-10T18:47:24.930+0800 E STORAGE [initandlisten] WiredTiger error (0) [1599734844:930719][22429:0x7f7cb61e8e40], file:sizeStorer.wt, WT_SESSION.open_cursor: read checksum error for 4096B block at offset 24576: block header checksum of 394736567 doesn't match expected checksum of 1046701883 ***aborting after fassert() failure
The wrong file changes to sizeStorer.wt and I have attached it in the attachments. Can it be repaired? | |
| Comment by Dmitry Agranat [ 10/Sep/20 ] | |
|
I've attached a repair attempt of the files you provided as repair_attempt_SERVER-50788.zip Thanks, | |
| Comment by lxh lxh [ 10/Sep/20 ] | |
|
The Secondary is a new node and it needs to resync data from the Primary. So the key point is to restore the Primary. Also, if --repair, it needs to update to 4.0+ and then the whole mongodb cluste may take a long time. I could not wait. For the log saying " file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 401408: block header checksum of 1071605299 doesn't match expected checksum of 1242809853", map you help repair the WiredTiger.wt file in the attachments? As the follows repair_attempt.tar.gz:https://jira.mongodb.org/browse/SERVER-46728 Thx. | |
| Comment by Dmitry Agranat [ 09/Sep/20 ] | |
|
Basically, you need to do a Maintenance on a Replica Set Member where you start a member as a standalone, do maintenance (in this case, a --repair) and restart it as a Replica Set member. Please let me know how it goes. Also, based on the provided logs, it seems that only Primary hit this issue. I did not see any issues with the Secondary. | |
| Comment by lxh lxh [ 09/Sep/20 ] | |
|
Yes ,I did but the replica set can't start so the node can't be removed by the command "rs.remove". Is there any other way to remove the node from the replica set ? Thx.
In case , all the mongod log files are uploaded.
| |
| Comment by Dmitry Agranat [ 09/Sep/20 ] | |
|
Yes, --repair should be done against a standalone node which is being removed from a replica set for this procedure. Did you try doing this? | |
| Comment by lxh lxh [ 09/Sep/20 ] | |
|
Hi Dima, Because all the 3-node mongod process start failed after unexpected shutdown, it can not resync from the primary node.
About mongod --repair , it can not be used by replica set. Do I understand wrongly?
In case, I tried to restore the WiredTiger files by the WiredTiger tool ,and the command is " ./wt -v -h /data/bak -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R salvage collection-38878–7827210234374637134.wt"
Finally,the files are uploaded.
By the way,I saw other people's solved question the same as mine,as the follows: https://jira.mongodb.org/browse/SERVER-46728
Thanks lxh | |
| Comment by Dmitry Agranat [ 09/Sep/20 ] | |
|
As MongoDB 3.4 has reached EOL, we can try to assist you as a one-time exception. Your configuration shows a 3-node replica set. The ideal resolution is to perform a clean resync from an unaffected node. In the event a resync of the failed member fails, please provide the logs covering this resync time You can also try mongod --repair using the latest version of MongoDB. In the event that a --repair operation is unsuccessful, then please also provide:
When you said:
Could you please clarify what "restore" means here (detailed steps)? In case you need to upload mongod logs, I've created a secure upload portal for you. Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time. Thanks, | |
| Comment by lxh lxh [ 09/Sep/20 ] | |
|
Thank you! I had tried to restore the WiredTiger files but failed, so I really need your help emergently for the broken product server. | |
| Comment by Tim Fogarty [ 08/Sep/20 ] | |
|
Hi 1554154677@qq.com, I'm moving this ticket to the SERVER project where we deal with errors related to mongod. |