[SERVER-19293] Unable to repair corrupted data after a server-crash (using wiredTiger) Created: 06/Jul/15  Updated: 07/Jan/16  Resolved: 26/Sep/15

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.2, 3.0.4
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Lucas Assignee: Ramon Fernandez Marina
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File MONGO.rar     File mongo.log.rar     File new files.rar    
Issue Links:
Depends
depends on WT-2002 verify should not panic Closed
Related
is related to SERVER-19134 primary can't restart after being killed Closed
Operating System: Linux
Participants:

 Description   

My MongoDB server crashed and I can't start the mongod service. I'm using this command to try to repair database: mongod --dbpath . --storageEngine wiredTiger --repair

Mongod version was 3.0.2, but i tried to upgrade to 3.0.4 and repair again, but gave the same error.

I researched a lot of ISSUES and did so many things to try to recover my entire database but nothing works. I can't find ways to ignore corruption in collections (loss some data) but recover some part of them.

Error when I try to repair:

2015-07-06T13:49:53.180-0500 I INDEX    [initandlisten]          building index using bulk method
2015-07-06T13:49:53.578-0500 I STORAGE  [initandlisten] Repairing collection database.users
2015-07-06T13:49:53.578-0500 I STORAGE  [initandlisten] WiredTiger progress session.verify 6
2015-07-06T13:49:53.578-0500 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-40--7480132900646616609. Not salvaging.
2015-07-06T13:49:53.692-0500 I INDEX    [initandlisten] build index on: database.users properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "database.users" }
2015-07-06T13:49:53.692-0500 I INDEX    [initandlisten]          building index using bulk method
2015-07-06T13:49:53.775-0500 I STORAGE  [initandlisten] Repairing collection database.users_history
2015-07-06T13:49:53.775-0500 I STORAGE  [initandlisten] WiredTiger progress session.verify 3
2015-07-06T13:49:53.775-0500 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-42--7480132900646616609. Not salvaging.
2015-07-06T13:49:53.880-0500 I INDEX    [initandlisten] build index on: database.users_history properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "database.users_history" }
2015-07-06T13:49:53.880-0500 I INDEX    [initandlisten]          building index using bulk method
2015-07-06T13:49:53.972-0500 I STORAGE  [initandlisten] repairDatabase datasource
2015-07-06T13:49:53.972-0500 I STORAGE  [initandlisten] Repairing collection datasource.reviews
2015-07-06T13:49:53.972-0500 E STORAGE  [initandlisten] WiredTiger (0) [1436208593:972487][16105:0x7fe889ce5bc0], file:collection-1595-7140502635356682714.wt, session.verify: read checksum error [4096B @ 9994240, 832067279 != 2155579310]
2015-07-06T13:49:53.972-0500 E STORAGE  [initandlisten] WiredTiger (0) [1436208593:972512][16105:0x7fe889ce5bc0], file:collection-1595-7140502635356682714.wt, session.verify: collection-1595-7140502635356682714.wt: encountered an illegal file format or internal value
2015-07-06T13:49:53.972-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1436208593:972525][16105:0x7fe889ce5bc0], file:collection-1595-7140502635356682714.wt, session.verify: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-07-06T13:49:53.972-0500 I -        [initandlisten] Fatal Assertion 28558
2015-07-06T13:49:53.983-0500 I CONTROL  [initandlisten]
 0xf5e199 0xefd1b1 0xee1cb1 0xd87dda 0x1390e89 0x1391045 0x13914e4 0x12e228e 0x12e2728 0x12e54cf 0x12e5648 0x1310afa 0x138c16e 0x138c3a8 0x138c836 0xd70ed5 0xd71671 0xcf7250 0xbf9648 0x80a8f4 0x7d6b89 0x7fe8882aaec5 0x8080e7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5E199"},{"b":"400000","o":"AFD1B1"},{"b":"400000","o":"AE1CB1"},{"b":"400000","o":"987DDA"},{"b":"400000","o":"F90E89"},{"b":"400000", "o":"F91045"},{"b":"400000","o":"F914E4"},{"b":"400000","o":"EE228E"},{"b":"400000","o":"EE2728"},{"b":"400000","o":"EE54CF"},{"b":"400000","o":"EE5648"},{"b":"400000", "o":"F10AFA"},{"b":"400000","o":"F8C16E"},{"b":"400000","o":"F8C3A8"},{"b":"400000","o":"F8C836"},{"b":"400000","o":"970ED5"},{"b":"400000","o":"971671"},{"b":"400000", "o":"8F7250"},{"b":"400000","o":"7F9648"},{"b":"400000","o":"40A8F4"},{"b":"400000","o":"3D6B89"},{"b":"7FE888289000","o":"21EC5"},{"b":"400000","o":"4080E7"}],"process Info":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "3.13.0-37-generic", "versio n" : "#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "32DC52072DB9385642CCB4D2AD2ACDA6E0 B87A27" }, { "b" : "7FFF57609000", "elfType" : 3, "buildId" : "0074678E5FFFF79F46C476077E67057161772F37" }, { "b" : "7FE8898B4000", "path" : "/lib/x86_64-linux-gnu/libp thread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FE889655000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfTyp e" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7FE88927A000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId" : "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7FE889072000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A5 9AD05BDBB487769AB" }, { "b" : "7FE888E6E000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b " : "7FE888B6A000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7FE888864000" , "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FE88864E000", "path" : "/lib/x86_64-lin ux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FE888289000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elf Type" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FE889AD2000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581 AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5e199]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefd1b1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xee1cb1]
 mongod(+0x987DDA) [0xd87dda]
 mongod(__wt_eventv+0x489) [0x1390e89]
 mongod(__wt_err+0x95) [0x1391045]
 mongod(__wt_panic+0x24) [0x13914e4]
 mongod(__wt_block_extlist_read+0x6E) [0x12e228e]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12e2728]
 mongod(+0xEE54CF) [0x12e54cf]
 mongod(__wt_block_verify_start+0x108) [0x12e5648]
 mongod(__wt_verify+0x4AA) [0x1310afa]
 mongod(__wt_schema_worker+0x35E) [0x138c16e]
 mongod(__wt_schema_worker+0x598) [0x138c3a8]
 mongod(+0xF8C836) [0x138c836]
 mongod(_ZN5mongo18WiredTigerKVEngine16_salvageIfNeededEPKc+0x45) [0xd70ed5]
 mongod(_ZN5mongo18WiredTigerKVEngine11repairIdentEPNS_16OperationContextERKNS_10StringDataE+0x51) [0xd71671]
 mongod(_ZN5mongo15KVStorageEngine17repairRecordStoreEPNS_16OperationContextERKSs+0xA0) [0xcf7250]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb+0x2A8) [0xbf9648]
 mongod(_ZN5mongo13initAndListenEi+0xA44) [0x80a8f4]
 mongod(main+0x139) [0x7d6b89]
 libc.so.6(__libc_start_main+0xF5) [0x7fe8882aaec5]
 mongod(+0x4080E7) [0x8080e7]
-----  END BACKTRACE  -----
2015-07-06T13:49:53.983-0500 I -        [initandlisten]
 
***aborting after fassert() failure



 Comments   
Comment by Ramon Fernandez Marina [ 31/Aug/15 ]

lucasoares, there were changes in WT-2002, included in MongoDB 3.0.6, that may help you here – would you be open to giving 3.0.6 a try and report back?

Thanks,
Ramón.

Comment by Lucas [ 07/Aug/15 ]

Sorry for this delay, ramon.fernandez. I was on a business trip and couldn't answer.

It's was a replica but my second node fails and this server was alone with an arbiter.
At that time I didn't use any configuration file, just shell commands: storageEngine, replSet, keyFile, dbpath and logpath (nojournal for arbiter).

Only mongod accessing files.

This replica was not in production, but had some data that I would like to redeem.

Comment by Ramon Fernandez Marina [ 30/Jul/15 ]

lucasoares, is this a stand-alone node or a replica set? Can you please provide some more details about your setup like OS, number of machines, configuration file for mongod, details about the underlying storage, etc.? Also, could there be any other processes outside mongod accessing these data files?

If I'm not mistaken, the rs.get() invariant above indicates data corruption on the shipyard.posts_5730602795925504_5206770867765248 collection. My first guess is that the server crash may have left the filesystem in an inconsistent state, and the subsequent fsck(8) has eliminated some data needed by mongod.

The details of your setup will determine what's the next step to get you back up and running.

Thanks,
Ramón.

Comment by Lucas [ 28/Jul/15 ]

Yes, michael.cahill, these files exists And no, I can't use this patch build or the old one to open the database without --repair..

2015-07-28T11:58:05.164-0500 I STORAGE  [initandlisten] Starting WiredTigerRecordStoreThread local.oplog.rs
2015-07-28T11:58:05.859-0500 E STORAGE  [initandlisten] WiredTiger (0) [1438102685:859291][17705:0x7f21e25acbc0], file:collection-1728-7140502635356682714.wt, session.open_cursor: read checksum error [4096B @ 5816320, 3258734761 != 3654219389]
2015-07-28T11:58:05.859-0500 E STORAGE  [initandlisten] WiredTiger (0) [1438102685:859339][17705:0x7f21e25acbc0], file:collection-1728-7140502635356682714.wt, session.open_cursor: collection-1728-7140502635356682714.wt: encountered an illegal file format or internal value
2015-07-28T11:58:05.859-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1438102685:859349][17705:0x7f21e25acbc0], file:collection-1728-7140502635356682714.wt, session.open_cursor: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-07-28T11:58:05.859-0500 I -        [initandlisten] Fatal Assertion 28558
2015-07-28T11:58:05.871-0500 I CONTROL  [initandlisten]
 0xf5e289 0xf07051 0xeeac81 0xd873ba 0x13951f9 0x13953b5 0x1395854 0x12e591e 0x12e5e83 0x12e2b53 0x12e6bb6 0x13005a1 0x132af1b 0x1394670 0x139482a 0x1338341 0x139279a 0x1349a83 0x13925e1 0x13929f5 0xd80612 0xd7c3cb 0xd75e44 0xd767b3 0xd76ac4 0xd70772 0xcf0736 0xcf36a1 0xd6f546 0xa6ddcd 0x7e3a42 0x7e8939 0x7f21e11aaec5 0x7e17e9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5E289"},{"b":"400000","o":"B07051"},{"b":"400000","o":"AEAC81"},{"b":"400000","o":"9873BA"},{"b":"400000","o":"F951F9"},{"b":"400000","o":"F953B5"},{"b":"400000","o":"F95854"},{"b":"400000","o":"EE591E"},{"b":"400000","o":"EE5E83"},{"b":"400000","o":"EE2B53"},{"b":"400000","o":"EE6BB6"},{"b":"400000","o":"F005A1"},{"b":"400000","o":"F2AF1B"},{"b":"400000","o":"F94670"},{"b":"400000","o":"F9482A"},{"b":"400000","o":"F38341"},{"b":"400000","o":"F9279A"},{"b":"400000","o":"F49A83"},{"b":"400000","o":"F925E1"},{"b":"400000","o":"F929F5"},{"b":"400000","o":"980612"},{"b":"400000","o":"97C3CB"},{"b":"400000","o":"975E44"},{"b":"400000","o":"9767B3"},{"b":"400000","o":"976AC4"},{"b":"400000","o":"970772"},{"b":"400000","o":"8F0736"},{"b":"400000","o":"8F36A1"},{"b":"400000","o":"96F546"},{"b":"400000","o":"66DDCD"},{"b":"400000","o":"3E3A42"},{"b":"400000","o":"3E8939"},{"b":"7F21E1189000","o":"21EC5"},{"b":"400000","o":"3E17E9"}],"processInfo":{ "mongodbVersion" : "3.0.5-rc1-pre-", "gitVersion" : "53a4724ee0a163e8666d602d53d7ef8920c3463e", "uname" : { "sysname" : "Linux", "release" : "3.13.0-37-generic", "version" : "#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF730A1000", "elfType" : 3 }, { "b" : "7F21E217A000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F21E1F72000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F21E1D6E000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F21E1A6A000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, {"b" : "7F21E1764000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F21E154E000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F21E1189000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F21E2398000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5e289]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf07051]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xeeac81]
 mongod(+0x9873BA) [0xd873ba]
 mongod(__wt_eventv+0x489) [0x13951f9]
 mongod(__wt_err+0x95) [0x13953b5]
 mongod(__wt_panic+0x24) [0x1395854]
 mongod(__wt_block_extlist_read+0x6E) [0x12e591e]
 mongod(__wt_block_extlist_read_avail+0x33) [0x12e5e83]
 mongod(__wt_block_checkpoint_load+0x193) [0x12e2b53]
 mongod(+0xEE6BB6) [0x12e6bb6]
 mongod(__wt_btree_open+0xAB1) [0x13005a1]
 mongod(__wt_conn_btree_get+0x19B) [0x132af1b]
 mongod(__wt_session_get_btree+0x500) [0x1394670]
 mongod(__wt_session_get_btree_ckpt+0xBA) [0x139482a]
 mongod(__wt_curfile_open+0xE1) [0x1338341]
 mongod(__wt_open_cursor+0x26A) [0x139279a]
 mongod(__wt_curtable_open+0x2E3) [0x1349a83]
 mongod(__wt_open_cursor+0xB1) [0x13925e1]
 mongod(+0xF929F5) [0x13929f5]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0x92) [0xd80612]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x4B) [0xd7c3cb]
 mongod(_ZN5mongo21WiredTigerRecordStore8IteratorC2ERKS0_PNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionEb+0x64) [0xd75e44]
 mongod(_ZNK5mongo21WiredTigerRecordStore11getIteratorEPNS_16OperationContextERKNS_8RecordIdERKNS_20CollectionScanParams9DirectionE+0x43) [0xd767b3]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextERKNS_10StringDataES5_bllPNS_28CappedDocumentDeleteCallbackEPNS_20WiredTigerSizeStorerE+0x2C4) [0xd76ac4]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextERKNS_10StringDataES5_RKNS_17CollectionOptionsE+0x132) [0xd70772]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x276) [0xcf0736]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x5E1) [0xcf36a1]
 mongod(+0x96F546) [0xd6f546]
 mongod(_ZN5mongo23GlobalEnvironmentMongoD22setGlobalStorageEngineERKSs+0x30D) [0xa6ddcd]
 mongod(_ZN5mongo13initAndListenEi+0x422) [0x7e3a42]
 mongod(main+0x139) [0x7e8939]
 libc.so.6(__libc_start_main+0xF5) [0x7f21e11aaec5]
 mongod(+0x3E17E9) [0x7e17e9]
-----  END BACKTRACE  -----
2015-07-28T11:58:05.872-0500 I -        [initandlisten]
 
***aborting after fassert() failure

Comment by Michael Cahill (Inactive) [ 28/Jul/15 ]

lucasoares, what I can see from the metadata is that shipyard.posts_5730602795925504_5206770867765248 corresponds to the WiredTiger file collection-330-3518279443082066347.wt. Can you check that file exists in the data directory?

In case that index build succeeded, the next entry in the metadata is collection shipyard.users_history_YouTube stored in collection-29--561297094768335572.wt, but again that looks correct in the metadata: I can't see why MongoDB would fail to open that collection without an error message from WiredTiger.

Sorry I can't be more helpful: is there anything about either of these collections or the corresponding files that looks suspicious to you?

Have you tried opening the database with the patch build after the repair fails without --repair? If that succeeds, you might be able to use the validate command to check that the collections are consistent.

Comment by Lucas [ 28/Jul/15 ]

michael.cahill I did the repair again and I attach all files with name "newFile.rar". Have these files:

mongo.log
WiredTiger.*
_mdb_catalog.wt
sizeStorer.wt

Crashed in:

2015-07-27T09:07:06.612-0500 I INDEX    [initandlisten] build index on: shipyard.posts_5730602795925504_5206770867765248 properties: { v: 1, key: { st_channel: 1, updater_level: 1 }, name: "st_channel_1_updater_level_1", ns: "shipyard.posts_5730602795925504_5206770867765248" }
2015-07-27T09:07:06.612-0500 I INDEX    [initandlisten] 	 building index using bulk method
2015-07-27T09:07:08.848-0500 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 186
2015-07-27T09:07:08.989-0500 I CONTROL  [initandlisten] 
 0xf5e289 0xf07051 0xeeabb2 0x900581 0x902433 0x905090 0xbefc90 0x7e4064 0x7e8939 0x7f3f5b712ec5 0x7e17e9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5E289"},{"b":"400000","o":"B07051"},{"b":"400000","o":"AEABB2"},{"b":"400000","o":"500581"},{"b":"400000","o":"502433"},{"b":"400000","o":"505090"},{"b":"400000","o":"7EFC90"},{"b":"400000","o":"3E4064"},{"b":"400000","o":"3E8939"},{"b":"7F3F5B6F1000","o":"21EC5"},{"b":"400000","o":"3E17E9"}],"processInfo":{ "mongodbVersion" : "3.0.5-rc1-pre-", "gitVersion" : "53a4724ee0a163e8666d602d53d7ef8920c3463e", "uname" : { "sysname" : "Linux", "release" : "3.13.0-37-generic", "version" : "#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFF38AC9000", "elfType" : 3 }, { "b" : "7F3F5C6E2000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F3F5C4DA000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F3F5C2D6000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F3F5BFD2000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F3F5BCCC000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F3F5BAB6000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F3F5B6F1000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F3F5C900000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5e289]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf07051]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xB2) [0xeeabb2]
 mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextERKNS_10StringDataE+0xE1) [0x900581]
 mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextERKNS_10StringDataEPNS_20DatabaseCatalogEntryE+0x1E3) [0x902433]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextERKNS_10StringDataEPb+0x150) [0x905090]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb+0x730) [0xbefc90]
 mongod(_ZN5mongo13initAndListenEi+0xA44) [0x7e4064]
 mongod(main+0x139) [0x7e8939]
 libc.so.6(__libc_start_main+0xF5) [0x7f3f5b712ec5]
 mongod(+0x3E17E9) [0x7e17e9]
-----  END BACKTRACE  -----
2015-07-27T09:07:08.989-0500 I -        [initandlisten] 
 
***aborting after invariant() failure

Comment by Michael Cahill (Inactive) [ 27/Jul/15 ]

lucasoares, I took a look at that data and didn't see the inconsistency. Can you send me the versions of those same files after the repair, even though it crashes?

Comment by Lucas [ 27/Jul/15 ]

michael.cahill All files that you requested are attached. File name: MONGO.rar

And for you question: maybe, but not very likely... My app creates new collection if one user ask to, but this application has few users.

Comment by Michael Cahill (Inactive) [ 27/Jul/15 ]

lucasoares, that message indicates that there is MongoDB metadata for a collection, but the collection does not exist in WiredTiger after the crash.

Is your application frequently creating collections? Is it possible that a collection was being created when the crash occurred?

I don't know a simple workaround for this one: I expect repair will keep failing at the same point. Would it be possible for you to upload WiredTiger.*, _mdb_catalog.wt and sizeStorer.wt? Then I should be able to repair these files manually.

Comment by Lucas [ 27/Jul/15 ]

This is weird, michael.cahill.. I executed again, and look this:

2015-07-25T22:04:57.189-0500 I INDEX    [initandlisten] build index on: shipyard.posts_5730602795925504_5206770867765248 properties: { v: 1, key: { st_resonance_score: -1, _id: 1 }, name: "st_resonance_score_-1__id_1", ns: "shipyard.posts_5730602795925504_5206770867765248" }
2015-07-25T22:04:57.189-0500 I INDEX    [initandlisten] 	 building index using bulk method
2015-07-25T22:04:57.368-0500 I INDEX    [initandlisten] build index on: shipyard.posts_5730602795925504_5206770867765248 properties: { v: 1, key: { st_channel: 1, updater_level: 1 }, name: "st_channel_1_updater_level_1", ns: "shipyard.posts_5730602795925504_5206770867765248" }
2015-07-25T22:04:57.368-0500 I INDEX    [initandlisten] 	 building index using bulk method
2015-07-25T22:05:03.117-0500 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 186
2015-07-25T22:05:03.313-0500 I CONTROL  [initandlisten] 
 0xf5e289 0xf07051 0xeeabb2 0x900581 0x902433 0x905090 0xbefc90 0x7e4064 0x7e8939 0x7f24bb8f6ec5 0x7e17e9
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5E289"},{"b":"400000","o":"B07051"},{"b":"400000","o":"AEABB2"},{"b":"400000","o":"500581"},{"b":"400000","o":"502433"},{"b":"400000","o":"505090"},{"b":"400000","o":"7EFC90"},{"b":"400000","o":"3E4064"},{"b":"400000","o":"3E8939"},{"b":"7F24BB8D5000","o":"21EC5"},{"b":"400000","o":"3E17E9"}],"processInfo":{ "mongodbVersion" : "3.0.5-rc1-pre-", "gitVersion" : "53a4724ee0a163e8666d602d53d7ef8920c3463e", "uname" : { "sysname" : "Linux", "release" : "3.13.0-37-generic", "version" : "#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000" }, { "b" : "7FFFCDCFE000", "elfType" : 3 }, { "b" : "7F24BC8C6000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3 }, { "b" : "7F24BC6BE000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3 }, { "b" : "7F24BC4BA000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3 }, { "b" : "7F24BC1B6000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3 }, { "b" : "7F24BBEB0000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3 }, { "b" : "7F24BBC9A000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3 }, { "b" : "7F24BB8D5000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3 }, { "b" : "7F24BCAE4000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3 } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5e289]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xf07051]
 mongod(_ZN5mongo15invariantFailedEPKcS1_j+0xB2) [0xeeabb2]
 mongod(_ZN5mongo8Database30_getOrCreateCollectionInstanceEPNS_16OperationContextERKNS_10StringDataE+0xE1) [0x900581]
 mongod(_ZN5mongo8DatabaseC1EPNS_16OperationContextERKNS_10StringDataEPNS_20DatabaseCatalogEntryE+0x1E3) [0x902433]
 mongod(_ZN5mongo14DatabaseHolder6openDbEPNS_16OperationContextERKNS_10StringDataEPb+0x150) [0x905090]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb+0x730) [0xbefc90]
 mongod(_ZN5mongo13initAndListenEi+0xA44) [0x7e4064]
 mongod(main+0x139) [0x7e8939]
 libc.so.6(__libc_start_main+0xF5) [0x7f24bb8f6ec5]
 mongod(+0x3E17E9) [0x7e17e9]
-----  END BACKTRACE  -----
2015-07-25T22:05:03.313-0500 I -        [initandlisten] 
 
***aborting after invariant() failure

Command used:

./mongo/mongodb-linux-x86_64-3.0.5-rc1-pre-/bin/mongod --dbpath . --storageEngine wiredTiger --logpath mongo.log --repair

And using using

./mongo/mongodb-linux-x86_64-3.0.5-rc1-pre-/bin/mongod --version

, says

db version v3.0.5-rc1-pre-
git version: 53a4724ee0a163e8666d602d53d7ef8920c3463e

Tomorrow I will attach new log generated. I'm executing repair again (and again and again) haha

Comment by Michael Cahill (Inactive) [ 27/Jul/15 ]

lucasoares, those logs were generated by stock MongoDB 3.0.4. If you run with the patched mongod binary I linked to above (here), you should see:

db version v3.0.5-rc1-pre-

In the log files you sent, that is "db version v3.0.4".

Comment by Lucas [ 25/Jul/15 ]

michael.cahill done. I don't found the latest one. but I have attached the first repair with new build. I will try to repair again to get a new log.

Comment by Lucas [ 25/Jul/15 ]

I'm not finding the sort (latest) one. I will send this (from first repair with new build, I think).

Comment by Michael Cahill (Inactive) [ 24/Jul/15 ]

Thanks, lucasoares, in that case can you please attach the full mongod.log from the latest repair runs?

Comment by Lucas [ 24/Jul/15 ]

No michael.cahill, isn't possible. With the build 3.0.4, the repair ran for less than one hour. Before this issue I ran the 3.0.4 build 10 times and all errors occured in collection-1595-7140502635356682714.wt file.

With the new build, the error occurs in collection-1-536229705572383833.wt. I tried to run the new build again after this, running for much less time, crashing in same file (collection-1-536229705572383833.wt)

Comment by Michael Cahill (Inactive) [ 24/Jul/15 ]

lucasoares, I am sorry that repair ran for so long without succeeding.

I just ran a test with the binaries I mentioned above. I deliberately created a database with this kind of corruption. Here is what I saw:

$ bin/mongod --dbpath=... --storageEngine=wiredTiger --repair
...
2015-07-24T11:27:27.770+1000 I CONTROL  [initandlisten] db version v3.0.5-rc1-pre-
2015-07-24T11:27:27.770+1000 I CONTROL  [initandlisten] git version: 53a4724ee0a163e8666d602d53d7ef8920c3463e
2015-07-24T11:27:27.770+1000 I CONTROL  [initandlisten] build info: Linux build2.ny.cbi.10gen.cc 2.6.32-431.3.1.el6.x86_64 #1 SMP Fri Jan 3 21:39:27 UTC 2014 x86_64 BOOST_LIB_VERSION=1_49
2015-07-24T11:27:27.770+1000 I CONTROL  [initandlisten] allocator: tcmalloc
2015-07-24T11:27:27.770+1000 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: ".../data", engine: "wiredTiger" } }
2015-07-24T11:27:27.770+1000 I STORAGE  [initandlisten] repairDatabase local
2015-07-24T11:27:27.770+1000 I STORAGE  [initandlisten] Repairing collection local.startup_log
2015-07-24T11:27:27.771+1000 E STORAGE  [initandlisten] WiredTiger (0) [1437701247:771409][84972:0x7f8ab613bb80], file:collection-0--595450959203544956.wt, session.verify: read checksum error [4096B @ 8192, 1351565544 != 2143735776]
2015-07-24T11:27:27.772+1000 E STORAGE  [initandlisten] WiredTiger (0) [1437701247:772193][84972:0x7f8ab613bb80], file:collection-0--595450959203544956.wt, session.verify: checkpoint ranges never verified: 1
2015-07-24T11:27:27.772+1000 E STORAGE  [initandlisten] WiredTiger (0) [1437701247:772238][84972:0x7f8ab613bb80], file:collection-0--595450959203544956.wt, session.verify: file ranges never verified: 1
2015-07-24T11:27:27.772+1000 I STORAGE  [initandlisten] WiredTiger progress session.verify 0
2015-07-24T11:27:27.772+1000 I STORAGE  [initandlisten] Verify failed on uri table:collection-0--595450959203544956. Running a salvage operation.
2015-07-24T11:27:27.773+1000 I STORAGE  [initandlisten] WiredTiger progress session.salvage 3

Is it possible that you were still running the stock 3.0.4 build?

Comment by Lucas [ 23/Jul/15 ]

After 30+ hours repairing, this error:

michael.cahill

2015-07-22T16:48:18.736-0500 I INDEX    [initandlisten]          building index using bulk method
2015-07-22T16:48:25.572-0500 I STORAGE  [initandlisten] Repairing collection shipyard.posts_5704147139559424_5942933362573312
2015-07-22T16:48:25.615-0500 E STORAGE  [initandlisten] WiredTiger (0) [1437601705:615521][18516:0x7ff8111f6bc0], file:collection-1-536229705572383833.wt, session.verify: read checksum error [12288B @ 51093504, 2484076299 != 1099885916]
2015-07-22T16:48:25.625-0500 E STORAGE  [initandlisten] WiredTiger (0) [1437601705:615581][18516:0x7ff8111f6bc0], file:collection-1-536229705572383833.wt, session.verify: collection-1-536229705572383833.wt: encountered an illegal file format or internal value
2015-07-22T16:48:25.625-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1437601705:625712][18516:0x7ff8111f6bc0], file:collection-1-536229705572383833.wt, session.verify: the process must exit and restart: WT_PANIC: WiredTiger library panic
2015-07-22T16:48:25.633-0500 I -        [initandlisten] Fatal Assertion 28558
2015-07-22T16:48:25.873-0500 I CONTROL  [initandlisten]
 0xf5e199 0xefd1b1 0xee1cb1 0xd87dda 0x1390e89 0x1391045 0x13914e4 0x12e228e 0x12e2728 0x12e54cf 0x12e5648 0x1310afa 0x138c16e 0x138c3a8 0x138c836 0xd70ed5 0xd71671 0xcf7250 0xbf9648 0x80a8f4 0x7d6b89 0x7ff80f7bbec5 0x8080e7
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"B5E199"},{"b":"400000","o":"AFD1B1"},{"b":"400000","o":"AE1CB1"},{"b":"400000","o":"987DDA"},{"b":"400000","o":"F90E89"},{"b":"400000","o":"F91045"},{"b":"400000","o":"F914E4"},{"b":"400000","o":"EE228E"},{"b":"400000","o":"EE2728"},{"b":"400000","o":"EE54CF"},{"b":"400000","o":"EE5648"},{"b":"400000","o":"F10AFA"},{"b":"400000","o":"F8C16E"},{"b":"400000","o":"F8C3A8"},{"b":"400000","o":"F8C836"},{"b":"400000","o":"970ED5"},{"b":"400000","o":"971671"},{"b":"400000","o":"8F7250"},{"b":"400000","o":"7F9648"},{"b":"400000","o":"40A8F4"},{"b":"400000","o":"3D6B89"},{"b":"7FF80F79A000","o":"21EC5"},{"b":"400000","o":"4080E7"}],"processInfo":{ "mongodbVersion" : "3.0.4", "gitVersion" : "0481c958daeb2969800511e7475dc66986fa9ed5", "uname" : { "sysname" : "Linux", "release" : "3.13.0-37-generic", "version" : "#64-Ubuntu SMP Mon Sep 22 21:28:38 UTC 2014", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "32DC52072DB9385642CCB4D2AD2ACDA6E0B87A27" }, { "b" : "7FFF422AC000", "elfType" : 3, "buildId" : "0074678E5FFFF79F46C476077E67057161772F37" }, { "b" : "7FF810DC5000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "9318E8AF0BFBE444731BB0461202EF57F7C39542" }, { "b" : "7FF810B66000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.0.0", "elfType" : 3, "buildId" : "A20EFFEC993A8441FA17F2079F923CBD04079E19" }, { "b" : "7FF81078B000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.0.0", "elfType" : 3, "buildId": "F000D29917E9B6E94A35A8F02E5C62846E5916BC" }, { "b" : "7FF810583000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "92FCF41EFE012D6186E31A59AD05BDBB487769AB" }, { "b" : "7FF81037F000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "C1AE4CB7195D337A77A3C689051DABAA3980CA0C" }, { "b" : "7FF81007B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "4BF6F7ADD8244AD86008E6BF40D90F8873892197" }, { "b" : "7FF80FD75000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "1D76B71E905CB867B27CEF230FCB20F01A3178F5" }, { "b" : "7FF80FB5F000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "8D0AA71411580EE6C08809695C3984769F25725B" }, { "b" : "7FF80F79A000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "30C94DC66A1FE95180C3D68D2B89E576D5AE213C" }, { "b" : "7FF810FE3000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "9F00581AB3C73E3AEA35995A0C50D24D59A01D47" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x29) [0xf5e199]
 mongod(_ZN5mongo10logContextEPKc+0xE1) [0xefd1b1]
 mongod(_ZN5mongo13fassertFailedEi+0x61) [0xee1cb1]
 mongod(+0x987DDA) [0xd87dda]
 mongod(__wt_eventv+0x489) [0x1390e89]
 mongod(__wt_err+0x95) [0x1391045]
 mongod(__wt_panic+0x24) [0x13914e4]
 mongod(__wt_block_extlist_read+0x6E) [0x12e228e]
 mongod(__wt_block_extlist_read_avail+0x28) [0x12e2728]
 mongod(+0xEE54CF) [0x12e54cf]
 mongod(__wt_block_verify_start+0x108) [0x12e5648]
 mongod(__wt_verify+0x4AA) [0x1310afa]
 mongod(__wt_schema_worker+0x35E) [0x138c16e]
 mongod(__wt_schema_worker+0x598) [0x138c3a8]
 mongod(+0xF8C836) [0x138c836]
 mongod(_ZN5mongo18WiredTigerKVEngine16_salvageIfNeededEPKc+0x45) [0xd70ed5]
 mongod(_ZN5mongo18WiredTigerKVEngine11repairIdentEPNS_16OperationContextERKNS_10StringDataE+0x51) [0xd71671]
 mongod(_ZN5mongo15KVStorageEngine17repairRecordStoreEPNS_16OperationContextERKSs+0xA0) [0xcf7250]
 mongod(_ZN5mongo14repairDatabaseEPNS_16OperationContextEPNS_13StorageEngineERKSsbb+0x2A8) [0xbf9648]
 mongod(_ZN5mongo13initAndListenEi+0xA44) [0x80a8f4]
 mongod(main+0x139) [0x7d6b89]
 libc.so.6(__libc_start_main+0xF5) [0x7ff80f7bbec5]
 mongod(+0x4080E7) [0x8080e7]
-----  END BACKTRACE  -----
2015-07-22T16:48:25.873-0500 I -        [initandlisten]
 
***aborting after fassert() failure

Comment by Lucas [ 22/Jul/15 ]

Ok. The recovery still going on.. 10 hours in a lot of "building index using bulk method".

Comment by Michael Cahill (Inactive) [ 22/Jul/15 ]

I'm afraid I can't help you much merging the databases – if it were me, I'd get lists of unique IDs from both databases and compare them offline (i.e., sort both then look at the differences between the two lists). Then I'd copy the missing documents into the new database. There may be an efficient way to have MongoDB do this for you, but unfortunately I don't know it.

Please do let me know how the recovery step goes.

Comment by Lucas [ 21/Jul/15 ]

Ok michael.cahill. Tomorrow I will try to recover my data and I'll tell you if it worked.

Can you tell me the better way to make a merge of this database with the current I own? Unique fields remains the same.

Thanks. See you tomorrow!

Comment by Michael Cahill (Inactive) [ 21/Jul/15 ]

lucasoares, I have created a patch build of MongoDB that should be able to recover your data. The patch build is here:

https://s3.amazonaws.com/mciuploads/mongodb-mongo-v3.0/linux-64/53a4724ee0a163e8666d602d53d7ef8920c3463e/binaries/mongo-mongodb_mongo_v3.0_linux_64_53a4724ee0a163e8666d602d53d7ef8920c3463e_15_07_17_07_57_00.tgz

I recommend that you download these binaries and use them only to repair the database, then revert to the latest stock 3.0.x binaries (currently 3.0.4, but 3.0.5 will be released soon).

Please let me know whether this resolves the issue for you.

Comment by Lucas [ 10/Jul/15 ]

Ok man, I'll be following you and this issue. I really need to recover some of this data because these data are very important to me.

Comment by Michael Cahill (Inactive) [ 09/Jul/15 ]

Hi lucasoares, I am sorry that you have hit this problem using MongoDB with WiredTiger.

We believe that you hit SERVER-18316 which caused the WiredTiger metadata to be inconsistent with data files after a crash. That bug has been fixed in MongoDB 3.0.4.

Unfortunately, you have also hit another problem where WiredTiger is detecting the inconsistency but failing with a hard error (a panic) rather than a soft error that would cause MongoDB to go on to repair the corruption. I have opened WT-2002 to get this bug in WiredTiger fixed.

Generated at Thu Feb 08 03:50:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.