[SERVER-32039] Corrupt WiredTiger.wt file after crash Created: 20/Nov/17  Updated: 27/Jul/18  Resolved: 21/Nov/17

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.2.16
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Johannes du Plessis Assignee: Mark Agarunov
Resolution: Done Votes: 0
Labels: envm, rge, rpu, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive mongodb.zip     File repair-SERVER-32039.tar.gz    
Operating System: Linux
Participants:

 Description   

We suffered a VM crash that housed a non replicated MongoDB server. I haven't been able to recover any of the data, using various steps found on JIRA (and StackOverflow). My best guess at this point is that the WiredTiger.wt got corrupted, so we can't start up the server.

It seems VERY similar to this issue: https://jira.mongodb.org/browse/SERVER-31124?focusedCommentId=1675509&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-1675509

I'm hoping you guys can help! I've attached the MongoDB folder & also the output of the startup attempt.

2017-11-20T08:28:20.180+0200 I CONTROL  [initandlisten] MongoDB starting : pid=25121 port=27017 dbpath=/var/lib/mongodb/ 64-bit host=hc-dbn-vps01_old
2017-11-20T08:28:20.180+0200 I CONTROL  [initandlisten] db version v3.2.16
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten] git version: 056bf45128114e44c5358c7a8776fb582363e094
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten] allocator: tcmalloc
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten] modules: none
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten] build environment:
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-11-20T08:28:20.181+0200 I CONTROL  [initandlisten]     distarch: x86_64
2017-11-20T08:28:20.182+0200 I CONTROL  [initandlisten]     target_arch: x86_64
2017-11-20T08:28:20.182+0200 I CONTROL  [initandlisten] options: { repair: true, storage: { dbPath: "/var/lib/mongodb/", engine: "wiredTiger" } }
2017-11-20T08:28:20.220+0200 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2017-11-20T08:28:20.220+0200 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2017-11-20T08:28:20.230+0200 E STORAGE  [initandlisten] WiredTiger (-31802) [1511159300:229985][25121:0x7f7ce23f0c80], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2017-11-20T08:28:20.230+0200 E STORAGE  [initandlisten] WiredTiger (0) [1511159300:230288][25121:0x7f7ce23f0c80], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2017-11-20T08:28:20.230+0200 E STORAGE  [initandlisten] WiredTiger (0) [1511159300:230475][25121:0x7f7ce23f0c80], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2017-11-20T08:28:20.230+0200 E STORAGE  [initandlisten] WiredTiger (0) [1511159300:230657][25121:0x7f7ce23f0c80], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2017-11-20T08:28:20.231+0200 I -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error
2017-11-20T08:28:20.231+0200 I STORAGE  [initandlisten] exception in initAndListen: 28718 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2017-11-20T08:28:20.232+0200 I CONTROL  [initandlisten] dbexit:  rc: 100



 Comments   
Comment by Johannes du Plessis [ 21/Nov/17 ]

I suspected might have been the case. Thanks so much again for you help Mark - you guys have been awesome at assisting!

Comment by Mark Agarunov [ 21/Nov/17 ]

Hello joduplessis,

Thank you for the information, I'm sorry to hear this didn't fix the issue. Unfortunately, this error indicates that there was corruption on the disk. In this situation, my best recommendation would be to resync the affected node if possible.

Thanks,
Mark

Comment by Johannes du Plessis [ 21/Nov/17 ]

Hi Mark,

Okay so replaced the files - thank you for that. Tried to restart the Mongo instance, but is was failing silently this time.

I checked the logs:

2017-11-21T08:09:01.218+0200 I CONTROL  [main] ***** SERVER RESTARTED *****
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] MongoDB starting : pid=29934 port=27017 dbpath=/var/lib/mongodb 64-bit host=hc-dbn-vps01_old
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] db version v3.2.16
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] git version: 056bf45128114e44c5358c7a8776fb582363e094
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2g  1 Mar 2016
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] allocator: tcmalloc
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] modules: none
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] build environment:
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten]     distmod: ubuntu1604
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten]     distarch: x86_64
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten]     target_arch: x86_64
2017-11-21T08:09:01.225+0200 I CONTROL  [initandlisten] options: { config: "/etc/mongod.conf", net: { bindIp: true, port: 27017 }, storage: { dbPath: "/var/lib/mongodb", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, pa$
2017-11-21T08:09:01.263+0200 I -        [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2017-11-21T08:09:01.263+0200 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snap$
2017-11-21T08:09:01.480+0200 E STORAGE  [initandlisten] WiredTiger (-31802) [1511244541:480664][29934:0x7ff7adb92c80], file:sizeStorer.wt, WT_SESSION.open_cursor: unable to read root page from file:sizeStorer.wt: WT_ERROR: non-specific WiredTiger error
2017-11-21T08:09:01.480+0200 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_size_storer.cpp 67
2017-11-21T08:09:01.487+0200 I CONTROL  [initandlisten]
 0x1556b32 0x14ee2a4 0x14d4d53 0x127099a 0x1259ce5 0x12523f5 0x114d61e 0x9caa21 0x97ec2a 0x7ff7ac3dd830 0x9c5d19
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"400000","o":"1156B32","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"10EE2A4","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"10D4D53","s":"_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j"},{"b":"400000","o":"E7099A",$
 mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x1556b32]
 mongod(_ZN5mongo10logContextEPKc+0x174) [0x14ee2a4]
 mongod(_ZN5mongo17invariantOKFailedEPKcRKNS_6StatusES1_j+0xF3) [0x14d4d53]
 mongod(_ZN5mongo20WiredTigerSizeStorerC1EP15__wt_connectionRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x1AA) [0x127099a]
 mongod(_ZN5mongo18WiredTigerKVEngineC1ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_S8_mbbb+0xC45) [0x1259ce5]
 mongod(+0xE523F5) [0x12523f5]
 mongod(_ZN5mongo20ServiceContextMongoD29initializeGlobalStorageEngineEv+0x3EE) [0x114d61e]
 mongod(+0x5CAA21) [0x9caa21]
 mongod(main+0x73A) [0x97ec2a]
 libc.so.6(__libc_start_main+0xF0) [0x7ff7ac3dd830]
 mongod(_start+0x29) [0x9c5d19]
-----  END BACKTRACE  -----
2017-11-21T08:09:01.487+0200 I -        [initandlisten]
 
***aborting after invariant() failure

Regarding your other queries, we've been strung in by management to use VMs in house - with no backup availability due to limited capacity. We've been hounding them for backups, this crash was a result of a VM replication happening when it wasn't supposed to. All of your points are questionable as we don't have direct access to anything other than an SSH login of the server.

Any idea what might be the cause here?

Comment by Johannes du Plessis [ 20/Nov/17 ]

Thanks so much Mark. Will give this a try in the morning!

Comment by Mark Agarunov [ 20/Nov/17 ]

Hello joduplessis,

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue? If you are still seeing errors after replacing these files, please provide the complete logs from mongod so that we can further investigate. Additionally, if this issue persists, please provide the following information:

  1. What kind of underlying storage mechanism are you using? Are the storage devices attached locally or over the network? Are the disks SSDs or HDDs? What kind of RAID and/or volume management system are you using?
  2. Would you please check the integrity of your disks?
  3. Has the database always been running this version of MongoDB? If not please describe the upgrade/downgrade cycles the database has been through.
  4. Have you manipulated (copied or moved) the underlying database files? If so, was mongod running?
  5. Have you ever restored this instance from backups?
  6. What method do you use to create backups?
  7. When was the underlying filesystem last checked and is it currently marked clean?

Thanks,
Mark

Generated at Thu Feb 08 04:28:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.