[SERVER-28666] WiredTiger error - file:WiredTiger.wt, WT_CURSOR.next: read checksum error Created: 07/Apr/17  Updated: 13/Aug/18  Resolved: 07/Apr/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.4.3
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Marc Henri [X] Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: envns, rpo, rpu, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File repair_attempts.tar.gz     File set1.tar.gz     File set2.tar.gz    
Operating System: Linux
Steps To Reproduce:

/etc/init.d/mongod_tw start
and
/etc/init.d/mongod_ig start

Participants:

 Description   

After power failure, 2 MongoDB Servers just won't start again.

With one I get: :

WiredTiger error (0) [1491533102:978650][31243:0x70b9da9a2dc0], file:WiredTiger.wt, WT_CURSOR.next: read checksum error for 28672B block at offset 9068544: block header checksum of 0 doesn't match expected checksum of 4059665721

And the with other :

WiredTiger error (-31802) [1491534981:938109][32667:0x65917b0f2dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.open_cursor: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-read: pread: failed to read 4094 bytes at offset 2: WT_ERROR: non-specific WiredTiger error

Would you be willing to try repairing our .wt files for both our servers separated by 2 set of files that i've attached ?

And also would you be able to explain the methods used to perform the repair attempt ?

Thanks



 Comments   
Comment by Kelsey Schubert [ 07/Apr/17 ]

Hi Cezam,

Unfortunately, this indicates that there was additional corruption on disk following the power failure. In this situation, my best recommendation would be to resync the affected nodes or restore from a backup.

Please note that SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support see our Technical Support page for additional resources.

Kind regards,
Thomas

Comment by Marc Henri [X] [ 07/Apr/17 ]

Now set2 failed as well after going through 80% of the db. Here is the error trace

2017-04-07T09:19:35.311-0400 I STORAGE  [initandlisten] Repairing collection big_daddy_instagram.accounts_2016_04_28
2017-04-07T09:19:35.311-0400 E STORAGE  [initandlisten] WiredTiger error (2) [1491571175:311700][17451:0x760b7e753dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.verify: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-open: open: No such file or directory
2017-04-07T09:19:35.311-0400 I STORAGE  [initandlisten] Verify failed on uri table:collection-30-4121866540730348039. Running a salvage operation.
2017-04-07T09:19:35.311-0400 E STORAGE  [initandlisten] WiredTiger error (2) [1491571175:311923][17451:0x760b7e753dc0], file:collection-30-4121866540730348039.wt, WT_SESSION.salvage: /data2/instagram_bak/collection-30-4121866540730348039.wt: handle-open: open: No such file or directory
2017-04-07T09:19:36.513-0400 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195
2017-04-07T09:19:36.513-0400 I -        [initandlisten] 
 
***aborting after invariant() failure
 
 
2017-04-07T09:19:36.519-0400 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0xc71b0964ac1 0xc71b0963bb9 0xc71b096409d 0x760b7d3bd370 0x760b7d0221d7 0x760b7d0238c8 0xc71afbf5cdc 0xc71afdd4b80 0xc71afddb537 0xc71afddf214 0xc71b02dbda8 0xc71afbdefcb 0xc71afbe1e95 0xc71afc01a94 0x760b7d00eb35 0xc71afc5f87f
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"C71AF3E6000","o":"157EAC1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"C71AF3E6000","o":"157DBB9"},{"b":"C71AF3E6000","o":"157E09D"},{"b":"760B7D3AE000","o":"F370"},{"b":"760B7CFED000","o":"351D7","s":"gsignal"},{"b":"760B7CFED000","o":"368C8","s":"abort"},{"b":"C71AF3E6000","o":"80FCDC","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"C71AF3E6000","o":"9EEB80","s":"_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE"},{"b":"C71AF3E6000","o":"9F5537","s":"_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE"},{"b":"C71AF3E6000","o":"9F9214","s":"_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb"},{"b":"C71AF3E6000","o":"EF5DA8","s":"_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb"},{"b":"C71AF3E6000","o":"7F8FCB"},{"b":"C71AF3E6000","o":"7FBE95"},{"b":"C71AF3E6000","o":"81BA94","s":"main"},{"b":"760B7CFED000","o":"21B35","s":"__libc_start_main"},{"b":"C71AF3E6000","o":"87987F"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#9 SMP Thu Oct 20 14:53:52 CEST 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "C71AF3E6000", "elfType" : 3, "buildId" : "E7548BC9521159DC8B80A4D768E5544FA00942D3" }, { "b" : "760B7F064000", "elfType" : 3, "buildId" : "11BE4D720B58B4AAC3FB4BF8311F6F3005C84E6B" }, { "b" : "760B7E2D8000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "760B7DEEE000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "760B7DCE6000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "760B7DAE2000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "760B7D7E0000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "760B7D5CA000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "760B7D3AE000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "760B7CFED000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "760B7E546000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "760B7CD9F000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "760B7CAB8000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "760B7C8B4000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "760B7C682000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "760B7C46C000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "760B7C25D000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "760B7C059000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "760B7BE3F000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "760B7BC18000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "760B7B9B7000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0xc71b0964ac1]
 mongod(+0x157DBB9) [0xc71b0963bb9]
 mongod(+0x157E09D) [0xc71b096409d]
 libpthread.so.0(+0xF370) [0x760b7d3bd370]
 libc.so.6(gsignal+0x37) [0x760b7d0221d7]
 libc.so.6(abort+0x148) [0x760b7d0238c8]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0xc71afbf5cdc]
 mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE+0xE0) [0xc71afdd4b80]
 mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE+0x677) [0xc71afddb537]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0xD44) [0xc71afddf214]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0x418) [0xc71b02dbda8]
 mongod(+0x7F8FCB) [0xc71afbdefcb]
 mongod(+0x7FBE95) [0xc71afbe1e95]
 mongod(main+0x964) [0xc71afc01a94]
 libc.so.6(__libc_start_main+0xF5) [0x760b7d00eb35]
 mongod(+0x87987F) [0xc71afc5f87f]
-----  END BACKTRACE  -----

Comment by Marc Henri [X] [ 07/Apr/17 ]

Hi again,

set1 ended up failing. Although repair ran much longer then prior to me sending you the files.
The error i'm getting now is the following.
Thanks
Marc

2017-04-07T08:50:10.523-0400 I STORAGE  [initandlisten] Repairing collection bid_daddy_twitter.accounts_2012_08_12
2017-04-07T08:50:10.523-0400 I STORAGE  [initandlisten] Verify failed on uri table:collection-1720--1122495060098508656. Running a salvage operation.
2017-04-07T08:50:10.646-0400 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 195
2017-04-07T08:50:10.646-0400 I -        [initandlisten] 
 
***aborting after invariant() failure
 
 
2017-04-07T08:50:10.652-0400 F -        [initandlisten] Got signal: 6 (Aborted).
 
 0x177e13ac1 0x177e12bb9 0x177e1309d 0x688fe952e370 0x688fe91931d7 0x688fe91948c8 0x1770a4cdc 0x177283b80 0x17728a537 0x17728e214 0x17778ada8 0x17708dfcb 0x177090e95 0x1770b0a94 0x688fe917fb35 0x17710e87f
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"176895000","o":"157EAC1","s":"_ZN5mongo15printStackTraceERSo"},{"b":"176895000","o":"157DBB9"},{"b":"176895000","o":"157E09D"},{"b":"688FE951F000","o":"F370"},{"b":"688FE915E000","o":"351D7","s":"gsignal"},{"b":"688FE915E000","o":"368C8","s":"abort"},{"b":"176895000","o":"80FCDC","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"176895000","o":"9EEB80","s":"_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE"},{"b":"176895000","o":"9F5537","s":"_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE"},{"b":"176895000","o":"9F9214","s":"_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb"},{"b":"176895000","o":"EF5DA8","s":"_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb"},{"b":"176895000","o":"7F8FCB"},{"b":"176895000","o":"7FBE95"},{"b":"176895000","o":"81BA94","s":"main"},{"b":"688FE915E000","o":"21B35","s":"__libc_start_main"},{"b":"176895000","o":"87987F"}],"processInfo":{ "mongodbVersion" : "3.4.3", "gitVersion" : "f07437fb5a6cca07c10bafa78365456eb1d6d5e1", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.14.32-xxxx-grs-ipv6-64", "version" : "#9 SMP Thu Oct 20 14:53:52 CEST 2016", "machine" : "x86_64" }, "somap" : [ { "b" : "176895000", "elfType" : 3, "buildId" : "E7548BC9521159DC8B80A4D768E5544FA00942D3" }, { "b" : "688FEB1D5000", "elfType" : 3, "buildId" : "11BE4D720B58B4AAC3FB4BF8311F6F3005C84E6B" }, { "b" : "688FEA449000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "688FEA05F000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "688FE9E57000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "688FE9C53000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "688FE9951000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "688FE973B000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "688FE951F000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "688FE915E000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "688FEA6B7000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "688FE8F10000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "688FE8C29000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "688FE8A25000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "688FE87F3000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "688FE85DD000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "688FE83CE000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "688FE81CA000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "688FE7FB0000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "688FE7D89000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "688FE7B28000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x177e13ac1]
 mongod(+0x157DBB9) [0x177e12bb9]
 mongod(+0x157E09D) [0x177e1309d]
 libpthread.so.0(+0xF370) [0x688fe952e370]
 libc.so.6(gsignal+0x37) [0x688fe91931d7]
 libc.so.6(abort+0x148) [0x688fe91948c8]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0x0) [0x1770a4cdc]
 mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextENS_10StringDataE+0xE0) [0x177283b80]
 mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextENS_10StringDataEPNS_20DatabaseCatalogEntryE+0x677) [0x17728a537]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextENS_10StringDataEPb+0xD44) [0x17728e214]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbb+0x418) [0x17778ada8]
 mongod(+0x7F8FCB) [0x17708dfcb]
 mongod(+0x7FBE95) [0x177090e95]
 mongod(main+0x964) [0x1770b0a94]
 libc.so.6(__libc_start_main+0xF5) [0x688fe917fb35]
 mongod(+0x87987F) [0x17710e87f]
-----  END BACKTRACE  -----

Comment by Marc Henri [X] [ 07/Apr/17 ]

Hey Thomas,

I've launched a repair on both databases using your files and now awaiting result.
I'll report back on how everything went.

Thank you

Marc

Comment by Kelsey Schubert [ 07/Apr/17 ]

Hi Cezam,

I've attached a tarball with repair attempts for both sets. Please extract and replace them in their respective paths, and let us know if it resolves the issue.

Unfortunately, the repair process we use to attempt these repairs is not ready to be publicly shared. We're tracking the work to make repair and recovery of the WiredTiger storage engine more robust in SERVER-19815. Please feel free to watch and vote for SERVER-19815.

Thank you,
Thomas

Generated at Thu Feb 08 04:18:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.