[SERVER-31832] Getting error while taking backup or selecting a record in a collection Created: 04/Nov/17 Updated: 14/Aug/18 Resolved: 14/Nov/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 3.4.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Sumit | Assignee: | Mark Agarunov |
| Resolution: | Done | Votes: | 0 |
| Labels: | envh, rns, rpu, trcf, wtc | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
| Operating System: | ALL |
| Participants: |
| Comments |
| Comment by Mark Agarunov [ 14/Nov/17 ] |
|
Hello sumit.jain@iongroup.com, I'm sorry to hear that the repair didn't fix the issue. Unfortunately this indicates that there is irreparable corruption on the disk, so the only course of action would be to resync the affected node or restore from a backup if possible. Thanks, |
| Comment by Sumit [ 10/Nov/17 ] |
|
Hi Mark, The repair files didn't resolve the issue. Not sure if there any any other options that we can try. Thanks |
| Comment by Sumit [ 09/Nov/17 ] |
|
Thanks Mark. We are working on it. I will let you know the status soon. |
| Comment by Mark Agarunov [ 09/Nov/17 ] |
|
Hello sumit.jain@iongroup.com, I've attached a repair attempt of the files you've provided as repair-SERVER-31832-2.tar.gz Thanks, |
| Comment by Sumit [ 09/Nov/17 ] |
|
Hi Mark, Thanks for getting back to us. I attached another set of WT files, right now out Mongo databases are offline. Will be great if you please send another set of repair files, and will try to restart the MongoDB service. The attached rar file is Nov-09-2017.rar Thanks |
| Comment by Mark Agarunov [ 07/Nov/17 ] |
|
Hello sumit.jain@iongroup.com, Thank you for the response. Unfortunately, this error indicates that there was corruption on the disk. In this situation, my best recommendation would be to resync the affected node or restore from a backup if possible. Thanks, |
| Comment by Sumit [ 07/Nov/17 ] |
|
Thanks Mark. Even after copying the files, we still got the below error, can you please look into this on priority. Its in our Production environment. 2017-11-06T16:23:29.889-0500 E STORAGE [conn10] WiredTiger error (0) [1510003409:889971][13956:2008429440], file:Prod01/collection-0--7662079245466072202.wt, WT_CURSOR.next: read checksum error for 901120B block at offset 163971072: calculated block checksum of 3779091760 doesn't match expected checksum of 1400883067 ***aborting after fassert() failure 2017-11-06T16:23:29.891-0500 I - [conn9] Fatal Assertion 28559 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 64 -----------------------------
|
| Comment by Mark Agarunov [ 06/Nov/17 ] |
|
Hello sumit.jain@iongroup.com, Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:
Thanks, |
| Comment by Sumit [ 04/Nov/17 ] |
|
Please help this is in Production environment. This is what I can see in the log file, attached are the WiredTiger files: 2017-11-03T22:21:37.846-0400 E STORAGE [conn71] WiredTiger error (0) [1509762097:846341][4664:140733117239424], file:Prod01/collection-0--7662079245466072202.wt, WT_CURSOR.next: read checksum error for 901120B block at offset 163971072: calculated block checksum of 233622082 doesn't match expected checksum of 1400883067 ***aborting after fassert() failure 2017-11-03T22:21:37.878-0400 I - [WTJournalFlusher] Fatal Assertion 28559 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 64 |