[SERVER-31361] Corrupt WiredTiger metadata Created: 03/Oct/17  Updated: 27/Jul/18  Resolved: 04/Oct/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.15
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Kyle Kavanagh Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: envns, rpo, rps, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-31361-repair.tar.gz     File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     File _mdb_catalog.wt    
Operating System: Linux
Participants:

 Description   

Using mongodb-3.2.15 on RH7

Failing to recover from an unexpected shutdown. Attempting to startup with --repair gives the following. Attaching the metadata files - Let me know if you need any additional files

2017-10-03T08:26:18.014-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-10-03T08:26:18.014-0500 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2017-10-03T08:26:18.014-0500 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=256G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-10-03T08:26:18.042-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1507037178:42474][150573:0x7f5659cbddc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2017-10-03T08:26:18.042-0500 E STORAGE  [initandlisten] WiredTiger (0) [1507037178:42534][150573:0x7f5659cbddc0], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2017-10-03T08:26:18.042-0500 E STORAGE  [initandlisten] WiredTiger (0) [1507037178:42550][150573:0x7f5659cbddc0], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2017-10-03T08:26:18.042-0500 E STORAGE  [initandlisten] WiredTiger (0) [1507037178:42578][150573:0x7f5659cbddc0], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options



 Comments   
Comment by Mark Agarunov [ 04/Oct/17 ]

Hello kdkavanagh,

Thanks for your response. I'm glad to hear that this fixed the issue and everything is working again. To prevent this type of problem in the future, we recommend implementing regular backups and/or replication to mitigate any issues related to unreliable storage layers or server failures.

Thanks,
Mark

Comment by Kyle Kavanagh [ 04/Oct/17 ]

Mark, happy to report that --repair seems to have made things whole again. Thanks for the quick help!

Comment by Mark Agarunov [ 03/Oct/17 ]

Hello kdkavanagh,

Thank you for the additional information. Unfortunately if running the repair is unsuccessful, it is likely due to corruption on the disk. If that is the case, my best recommendation would be to resync the affected node or restore from a backup if possible. Please let me know if the --repair has fixed this issue for you.

Thanks,
Mark

Comment by Kyle Kavanagh [ 03/Oct/17 ]

Thanks for the quick help Mark,

The new files seem to have moved things along. Now getting the following error. Running with --repair now, will see if that fixes the issue, seems to be chugging along. If not, would I be able to email you that collection.wt file? Without knowing the contents, I'm not sure that I can post it publicly.

2017-10-03T10:49:11.134-0500 W -        [initandlisten] Detected unclean shutdown - /mongoWiredTiger/mongod.lock is not empty.
2017-10-03T10:49:11.136-0500 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2017-10-03T10:49:11.136-0500 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=256G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast)
,log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-10-03T10:49:12.326-0500 E STORAGE  [initandlisten] WiredTiger (-31802) [1507045752:326548][169314:0x7f6ccd913dc0], file:collection-0--2399924296294425877.wt, WT_SESSION.open_cursor: unable t
o read root page from file:collection-0--2399924296294425877.wt: WT_ERROR: non-specific WiredTiger error
2017-10-03T10:49:12.326-0500 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/
wiredtiger_session_cache.cpp 79
2017-10-03T10:49:12.340-0500 I CONTROL  [initandlisten]
 0x13691b2 0x12fe9b4 0x12e722d 0x10ccc74 0x10cb560 0x10c73a2 0x10c5dfd 0x10b6d99 0x1016844 0x101c5a2 0x10b5688 0xfda8fe 0x9a18b0 0x9a4d8d 0x7f6ccc153c05 0x99d21d
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F691B2","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EFE9B4","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EE722D","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"CCCC74","s":"_ZN5mongo17WiredTigerSession9getCursorERKSsmb"},{"b":"400000","o":"CCB560","s":"_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE"},{"b":"400000","o":"CC73A2","s":"_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b"},{"b":"400000","o":"CC5DFD","s":"_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE"},{"b":"400000","o":"CB6D99","s":"_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE"},{"b":"400000","o":"C16844","s":"_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb"},{"b":"400000","o":"C1C5A2","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE"},{"b":"400000","o":"CB5688"},{"b":"400000","o":"BDA8FE","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"5A18B0"},{"b":"400000","o":"5A4D8D","s":"main"},{"b":"7F6CCC132000","o":"21C05","s":"__libc_start_main"},{"b":"400000","o":"59D21D"}],"processInfo":{ "mongodbVersion" : "3.2.15", "gitVersion" : "e11e3c1b9c9ce3f7b4a79493e16f5e4504e01140", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-693.2.1.el7.x86_64", "version" : "#1 SMP Fri Aug 11 04:58:43 EDT 2017", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "115BC4E0A4CF9482C074934FD9D3F139E8B8D333" }, { "b" : "7FFF288C4000", "elfType" : 3, "buildId" : "E022DBB53918349801F7A932682F24DA99A0835A" }, { "b" : "7F6CCD496000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "83A75B80BD8EB0C10DA10D868A3FAA3CBF68FF5E" }, { "b" : "7F6CCD035000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7B27FA381BBE2CACAC2061EF2E723A951EDA2C88" }, { "b" : "7F6CCCE2D000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "76C4765826961C60BE913F5549EBBC7BBE506899" }, { "b" : "7F6CCCC29000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "EF023E0C16991E8957337C90B0E51025FE27C897" }, { "b" : "7F6CCC927000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "6A344A5CE77A247399AA2F22F92C698F574A7134" }, { "b" : "7F6CCC711000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3E85E6D20D2CE9CDAD535084BEA56620BAAD687C" }, { "b" : "7F6CCC4F5000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "6091139B61D901AA8425DBC16B7F51D455A416B4" }, { "b" : "7F6CCC132000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "FF1D2DE4CF353118960C5EA195CEB13699929C14" }, { "b" : "7F6CCD708000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "A9980CF253C79740E69F70DCB8FEA7B8C2F641B5" }, { "b" : "7F6CCBEE5000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "DC2687F4E7034B1175A0351CB17D9AFE14D3B2E2" }, { "b" : "7F6CCBBFD000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "700752C8E20EBF87C748DDD6C847BFC49D8183D1" }, { "b" : "7F6CCB9F9000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "6E86882613DA83679DA9AD32B128F2D7B7709CC0" }, { "b" : "7F6CCB7C6000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "9F4CF016958F7E9CBEFC1BDEE328781D81BF72E9" }, { "b" : "7F6CCB5B0000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "FE621E91052A9A77CC263E00A8A21C2BC0867E21" }, { "b" : "7F6CCB3A2000", "path" : "/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "6961067A58DEF0FA65DFFA86CC07B768F93807CE" }, { "b" : "7F6CCB19E000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "8CA73C16CFEB9A8B5660015B9223B09F87041CAD" }, { "b" : "7F6CCAF84000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "1B56D55B99DF90787361ED6C2EA852562E67C486" }, { "b" : "7F6CCAD5D000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "6A5EF7A05F7E488FCD280BFADD96083BEC9FD416" }, { "b" : "7F6CCAAFB000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "F5B144F9F5D9BE451C80211B34DB2CE348E039B6" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x13691b2]
 mongod(_ZN5mongo10logContextEPKc+0x144) [0x12fe9b4]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x12e722d]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0xE4) [0x10ccc74]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x50) [0x10cb560]
 mongod(_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b+0x92) [0x10c73a2]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE+0x3ED) [0x10c5dfd]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0xE9) [0x10b6d99]
 mongod(_ZN5mongo22KVDatabaseCatalogEntry14initCollectionEPNS_16OperationContextERKSsb+0x204) [0x1016844]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x6D2) [0x101c5a2]
 mongod(+0xCB5688) [0x10b5688]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x59E) [0xfda8fe]
 mongod(+0x5A18B0) [0x9a18b0]
 mongod(main+0x15D) [0x9a4d8d]
 libc.so.6(__libc_start_main+0xF5) [0x7f6ccc153c05]
 mongod(+0x59D21D) [0x99d21d]
-----  END BACKTRACE  -----
2017-10-03T10:49:12.340-0500 I -        [initandlisten]
 
***aborting after invariant() failure

Comment by Mark Agarunov [ 03/Oct/17 ]

Hello kdkavanagh,

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark

Generated at Thu Feb 08 04:26:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.