[SERVER-30753] Wiredtiger.wt likely corrupted Created: 20/Aug/17  Updated: 27/Jul/18  Resolved: 29/Aug/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.16
Fix Version/s: None

Type: Bug Priority: Critical - P2
Reporter: Anuj Gupta Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: envm, rge, rpu, szs, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     File _mdb_catalog.wt     File mongod.conf     Text File mongod.log     File repair-SERVER-30753.tar.gz     File sizeStorer.wt    
Operating System: Linux
Participants:

 Description   

Hello,

My mongodb likely had an unclean shutdown and i have spent days trying to get it back up. I am able to start a new mongodb instance in a new location but cannot get my old one with all of my old collections and indexes to work properly. When I try to repair it, it gives the following message:

[ec2-user@ip-172-31-30-192 tmp]$ mongod --repair --dbpath /data
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] MongoDB starting : pid=31865 port=27017 dbpath=/data 64-bit host=ip-172-31-30-192
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] db version v3.2.16
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] git version: 056bf45128114e44c5358c7a8776fb582363e094
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.0-fips 29 Mar 2010
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] modules: none
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] build environment:
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten]     distmod: amazon
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-08-20T16:20:30.951+0000 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: "/data" } }
2017-08-20T16:20:30.972+0000 I -        [initandlisten] Detected data files in /data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-08-20T16:20:30.972+0000 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2017-08-20T16:20:30.972+0000 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=17G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-08-20T16:20:30.981+0000 E STORAGE  [initandlisten] WiredTiger (-31802) [1503246030:981472][31865:0x7f6ad1d9fd80], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2017-08-20T16:20:30.981+0000 E STORAGE  [initandlisten] WiredTiger (0) [1503246030:981530][31865:0x7f6ad1d9fd80], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2017-08-20T16:20:30.981+0000 E STORAGE  [initandlisten] WiredTiger (0) [1503246030:981548][31865:0x7f6ad1d9fd80], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2017-08-20T16:20:30.981+0000 E STORAGE  [initandlisten] WiredTiger (0) [1503246030:981564][31865:0x7f6ad1d9fd80], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2017-08-20T16:20:30.981+0000 I -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error
2017-08-20T16:20:30.982+0000 I STORAGE  [initandlisten] exception in initAndListen: 28718 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2017-08-20T16:20:30.982+0000 I CONTROL  [initandlisten] dbexit:  rc: 100

From past people experiencing the same issues, it would appear my WiredTiger.rt is corrupted? Can anyone please assist with these issues? I have attached my associated WiredTiger.rt files.



 Comments   
Comment by Mark Agarunov [ 29/Aug/17 ]

Hello anujgupta201121,

Unfortunately, this error indicates that there was corruption on the disk. In this situation, my best recommendation would be to resync the affected node or restore from a backup if possible.

Thanks,
Mark

Comment by Anuj Gupta [ 23/Aug/17 ]

From my understanding, it is actually the mongodb_catalog file that is also corrupted. Is there any way you would be able to repair that?

Comment by Anuj Gupta [ 21/Aug/17 ]

Unfortunately it did not work. I ran it on port 27117. I attached the log file. Keep in mind i am running a new mongod instance temporarily with empty data on the default port till i am able to get the old one up. It is what you likely will see in the logs. I added the configuration file as well. The error message is also below:

2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] MongoDB starting : pid=11623 port=27117 dbpath=/data/ 64-bit host=ip-172-31-30-192
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] db version v3.2.16
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] git version: 056bf45128114e44c5358c7a8776fb582363e094
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.0-fips 29 Mar 2010
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] allocator: tcmalloc
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] modules: none
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] build environment:
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten]     distmod: amazon
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten]     distarch: x86_64
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten]     target_arch: x86_64
2017-08-21T22:02:20.743+0000 I CONTROL  [initandlisten] options: { net: { port: 27117 }, storage: { dbPath: "/data/" }, systemLog: { verbosity: 4 } }
2017-08-21T22:02:20.743+0000 D NETWORK  [initandlisten] fd limit hard:64000 soft:64000 max conn: 51200
2017-08-21T22:02:20.764+0000 I -        [initandlisten] Detected data files in /data/ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger                              '.
2017-08-21T22:02:20.764+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=17G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,stat                              istics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-08-21T22:02:20.790+0000 D COMMAND  [WTJournalFlusher] BackgroundJob starting: WTJournalFlusher
2017-08-21T22:02:20.790+0000 D STORAGE  [WTJournalFlusher] starting WTJournalFlusher thread
2017-08-21T22:02:20.790+0000 D STORAGE  [WTJournalFlusher] flushed journal
2017-08-21T22:02:20.790+0000 E STORAGE  [initandlisten] WiredTiger (17) [1503352940:790977][11623:0x7f9b8a1f9d80], WT_SESSION.create: /data//sizeStorer.wt: handle-open: open: File e                              xists
2017-08-21T22:02:20.791+0000 I STORAGE  [initandlisten] WiredTiger unexpected file sizeStorer.wt found, renamed to sizeStorer.wt.1
2017-08-21T22:02:20.795+0000 D STORAGE  [initandlisten] WiredTigerKVEngine::createRecordStore uri: table:_mdb_catalog config: type=file,memory_page_max=10m,split_pct=90,leaf_value_m                              ax=64MB,checksum=on,block_compressor=snappy,,key_format=q,value_format=u,app_metadata=(formatVersion=1)
2017-08-21T22:02:20.796+0000 D STORAGE  [initandlisten] WT begin_transaction for snapshot id 1
2017-08-21T22:02:20.797+0000 D STORAGE  [initandlisten] WiredTigerUtil::checkApplicationMetadataFormatVersion  uri: table:_mdb_catalog ok range 1 -> 1 current: 1
2017-08-21T22:02:20.797+0000 E STORAGE  [initandlisten] WiredTiger (-31802) [1503352940:797366][11623:0x7f9b8a1f9d80], file:_mdb_catalog.wt, WT_SESSION.open_cursor: unable to read r                              oot page from file:_mdb_catalog.wt: WT_ERROR: non-specific WiredTiger error
2017-08-21T22:02:20.797+0000 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/stora                              ge/wiredtiger/wiredtiger_session_cache.cpp 79
2017-08-21T22:02:20.805+0000 I CONTROL  [initandlisten]
 0x133b892 0x12d07c4 0x12b8a8d 0x1095b64 0x1094450 0x1090292 0x108ecbd 0x107fccb 0xfe46ab 0x107e4f8 0xfa1b2e 0x9551d0 0x9586cd 0x7f9b88ab4b35 0x950b89
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"F3B892","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"ED07C4","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB8A8D","s":"_ZN5mongo1                              7invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"C95B64","s":"_ZN5mongo17WiredTigerSession9getCursorERKSsmb"},{"b":"400000","o":"C94450","s":"_ZN5mongo16WiredTigerCursor                              C1ERKSsmbPNS_16OperationContextE"},{"b":"400000","o":"C90292","s":"_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b"},{"b":"400000","o":"C8ECBD","s":"_ZN5mong                              o21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE"},{"b":"400000","o":"C7FCCB","s":"_ZN5mongo18WiredTigerKV                              Engine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE"},{"b":"400000","o":"BE46AB","s":"_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStor                              ageEngineOptionsE"},{"b":"400000","o":"C7E4F8"},{"b":"400000","o":"BA1B2E","s":"_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv"},{"b":"400000","o":"5551D0"},{"b":"                              400000","o":"5586CD","s":"main"},{"b":"7F9B88A93000","o":"21B35","s":"__libc_start_main"},{"b":"400000","o":"550B89"}],"processInfo":{ "mongodbVersion" : "3.2.16", "gitVersion" : "0                              56bf45128114e44c5358c7a8776fb582363e094", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.4.14-24.50.amzn1.x86_64", "version" : "#1 SMP Fri Jun 24 19:56:04 U                              TC 2016", "machine" : "x86_64" }, "somap" : [ { "elfType" : 2, "b" : "400000", "buildId" : "6EB9BC70EFB54D5516DB9649058BF6EC79B56F75" }, { "b" : "7FFDA299B000", "elfType" : 3, "buil                              dId" : "2F083C109EF09C65F3DA78FCB79F4275581DFF1E" }, { "b" : "7F9B89D7D000", "path" : "/usr/lib64/libssl.so.10", "elfType" : 3, "buildId" : "FAF738A64DDEFBE202A700A23E88667132CFB755                              " }, { "b" : "7F9B89995000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "BA85B1BBD34D0502B5648916F409484083D688E7" }, { "b" : "7F9B8978D000", "path" : "/lib64/lib                              rt.so.1", "elfType" : 3, "buildId" : "F965E296DCBFAD601FA60EF6EC19AFEDE633C777" }, { "b" : "7F9B89589000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "EB575314825D0BB0                              D64D2251E8B779E52FA8D419" }, { "b" : "7F9B89287000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "8284A38DE969B169CBFAF326C5D63342797B8010" }, { "b" : "7F9B89071000", "p                              ath" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "3FD5F89DE59E124AB1419A0BD16775B4096E84FD" }, { "b" : "7F9B88E55000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "bu                              ildId" : "1F4696737495F92BDF68A7201E121A571F0FA762" }, { "b" : "7F9B88A93000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "4B81214AF3D685CD8B94A4F8C19D1C6459F2B630" },                               { "b" : "7F9B89FEB000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "24686735C8371E29ED42E282A7CCE6DE67CF345A" }, { "b" : "7F9B88845000", "path" : "/usr/lib64                              /libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "1BE9E6309ED365E35806E13FA9E23350D71F2513" }, { "b" : "7F9B8855E000", "path" : "/usr/lib64/libkrb5.so.3", "elfType" : 3, "buildId"                               : "9EE23694485D684651195C7B51766E47D0CB95E3" }, { "b" : "7F9B8835B000", "path" : "/usr/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "5C01209C5AE1B1714F19B07EB58F2A1274B69DC8"                               }, { "b" : "7F9B88129000", "path" : "/usr/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "FD5974E4861D56DFFFFC8BF5DB35E74B1C20ABD5" }, { "b" : "7F9B87F13000", "path" : "/lib64/                              libz.so.1", "elfType" : 3, "buildId" : "89C6AF118B6B4FB6A73AE1813E2C8BDD722956D1" }, { "b" : "7F9B87D04000", "path" : "/usr/lib64/libkrb5support.so.0", "elfType" : 3, "buildId" : "1                              B55330B231D45AF433F7D9DCA507C5FB0609780" }, { "b" : "7F9B87B01000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "37A58210FA50C91E09387765408A92909468D25B" }, { "b                              " : "7F9B878E7000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "13F0C4788711C7B6BC537231CA981ECD345CAE5E" }, { "b" : "7F9B876C6000", "path" : "/usr/lib64/libselinu                              x.so.1", "elfType" : 3, "buildId" : "F5054DC94443326819FBF3065CFDF5E4726F57EE" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x133b892]
 mongod(_ZN5mongo10logContextEPKc+0x144) [0x12d07c4]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xAD) [0x12b8a8d]
 mongod(_ZN5mongo17WiredTigerSession9getCursorERKSsmb+0xE4) [0x1095b64]
 mongod(_ZN5mongo16WiredTigerCursorC1ERKSsmbPNS_16OperationContextE+0x50) [0x1094450]
 mongod(_ZN5mongo21WiredTigerRecordStore6CursorC1EPNS_16OperationContextERKS0_b+0x92) [0x1090292]
 mongod(_ZN5mongo21WiredTigerRecordStoreC1EPNS_16OperationContextENS_10StringDataES3_SsbbllPNS_14CappedCallbackEPNS_20WiredTigerSizeStorerE+0x3ED) [0x108ecbd]
 mongod(_ZN5mongo18WiredTigerKVEngine14getRecordStoreEPNS_16OperationContextENS_10StringDataES3_RKNS_17CollectionOptionsE+0x1AB) [0x107fccb]
 mongod(_ZN5mongo15KVStorageEngineC1EPNS_8KVEngineERKNS_22KVStorageEngineOptionsE+0x46B) [0xfe46ab]
 mongod(+0xC7E4F8) [0x107e4f8]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x59E) [0xfa1b2e]
 mongod(+0x5551D0) [0x9551d0]
 mongod(main+0x15D) [0x9586cd]
 libc.so.6(__libc_start_main+0xF5) [0x7f9b88ab4b35]
 mongod(+0x550B89) [0x950b89]
-----  END BACKTRACE  -----
2017-08-21T22:02:20.805+0000 I -        [initandlisten]
 
***aborting after invariant() failure

In terms of your questions:

1. I am running it on an AWS EC2 instance. I believe it is is SSD.

2. I verified with AWS support, there is no issue with the integrity of the disk.

3. The database has always been running this version.

4. I did likely move the database files in trying to get the database back up. The mongod may have been running without me realizing it.

5. It has not been restored.

6. I had no backups (although i corrected that issue now)

7. It is marked clean. i had aws support check in the last 24 hours.

Comment by Mark Agarunov [ 21/Aug/17 ]

Hello anujgupta201121,

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark

Comment by Anuj Gupta [ 21/Aug/17 ]

Really appreciate you looking into this. I have uploaded the WiredTiger.turtle

Comment by Mark Agarunov [ 21/Aug/17 ]

Hello anujgupta201121,

Thank you for the report. Would you please upload the WiredTiger.turtle file in addition to the WiredTiger.wt file that you've already uploaded so we can attempt a repair?

Thanks,
Mark

Generated at Thu Feb 08 04:24:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.