[SERVER-24578] Corrupt WiredTiger.wt fails to recover Created: 14/Jun/16  Updated: 13/Aug/18  Resolved: 20/Jun/16

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.0.6, 3.0.10, 3.2.7
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Stefan Rogin Assignee: Ramon Fernandez Marina
Resolution: Done Votes: 0
Labels: envns, rge, rpu, trcf, wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: HTML File WiredTiger     File WiredTiger.basecfg     File WiredTiger.turtle     File WiredTiger.wt     File _mdb_catalog.wt     File repair_attempt.tgz     File sizeStorer.wt     File storage.bson    
Issue Links:
Related
Operating System: Linux
Steps To Reproduce:

Don't know how to reproduce, here's how it happened to me:

  • run mongodb, add data & stop (April)
  • run system updates on ubuntu
  • try to start mongdb after 2 months.
Participants:

 Description   

Database with WT engine fails to recover.

I've tried --repair but with no success.

2016-06-15T01:44:12.459+0300 E STORAGE  [initandlisten] WiredTiger (0) [1465944252:459446][27643:0x7f3f5204cbc0], file:WiredTiger.wt, connection: read checksum error for 4096B block at offset 102400: block header checksum of 1099557987 doesn't match expected checksum of 106923698
2016-06-15T01:44:12.459+0300 E STORAGE  [initandlisten] WiredTiger (0) [1465944252:459493][27643:0x7f3f5204cbc0], file:WiredTiger.wt, connection: WiredTiger.wt: encountered an illegal file format or internal value
2016-06-15T01:44:12.459+0300 E STORAGE  [initandlisten] WiredTiger (-31804) [1465944252:459507][27643:0x7f3f5204cbc0], file:WiredTiger.wt, connection: the process must exit and restart: WT_PANIC: WiredTiger library panic
2016-06-15T01:44:12.459+0300 I -        [initandlisten] Fatal Assertion 28558
2016-06-15T01:44:12.469+0300 I CONTROL  [initandlisten] 
 0xfa8242 0xf454d9 0xf29706 0xdca081 0x13ff32c 0x13ff4dd 0x13ff954 0x134aa24 0x136697b 0x1364557 0x136544d 0x1390a5b 0x13fe6d9 0x13ca183 0x138dfc9 0xdb4153 0xdb1e70 0xab1dca 0x82f9b1 0x7fd249 0x7f3f5092dec5 0x82d6d7

This is an older instance of mongodb on my computer, the mongodb version might be at fault.

I've tried updating mongdb to the latest version (3.2.7), I get a different error but still no success

2016-06-15T02:14:28.129+0300 E STORAGE  [initandlisten] WiredTiger (-31802) [1465946068:129595][29342:0x7fa307a4acc0], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2016-06-15T02:14:28.129+0300 E STORAGE  [initandlisten] WiredTiger (0) [1465946068:129657][29342:0x7fa307a4acc0], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2016-06-15T02:14:28.129+0300 E STORAGE  [initandlisten] WiredTiger (0) [1465946068:129673][29342:0x7fa307a4acc0], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2016-06-15T02:14:28.129+0300 E STORAGE  [initandlisten] WiredTiger (0) [1465946068:129691][29342:0x7fa307a4acc0], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2016-06-15T02:14:28.130+0300 I -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error
2016-06-15T02:14:28.130+0300 I STORAGE  [initandlisten] exception in initAndListen: 28718 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2016-06-15T02:14:28.130+0300 I CONTROL  [initandlisten] dbexit:  rc: 100

Please help, no other backup available.



 Comments   
Comment by Ramon Fernandez Marina [ 21/Jun/16 ]

alexbevi@gmail.com, you may be able to use the wt command line utility to try to dump the data out of the .wt files and later reload them in a new mongod. You can reach out to the mongodb-user group if you need assistance with this approach.

Regards,
Ramón.

Comment by Kelsey Schubert [ 21/Jun/16 ]

Hi alexbevi,

There are planned improvements to repair with WiredTiger including the behavior you describe. For more details, please review the ticket tracking these improvements, SERVER-19815.

Please feel free to vote for SERVER-19815 and watch it for updates.

Thank you,
Thomas

Comment by Alex Bevilacqua [ 21/Jun/16 ]

@Ramon, just out of curiosity, would there be a way for the mongod repair to "skip" failed collections? Sometimes getting ANYTHING back online as quickly as possible would be extremely helpful.

Comment by Ramon Fernandez Marina [ 20/Jun/16 ]

If I understand correctly, the following line is indicating data corruption that's beyond repair:

2016-06-16T16:15:53.433+0300 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 190

Unfortunately we're not able to provide further support here, since the SERVER project is for reporting bugs or feature suggestions for the MongoDB server. For MongoDB-related support discussion please post on the mongodb-user group or Stack Overflow with the mongodb tag, where your question will reach a larger audience.

Regards,
Ramón.

Comment by Stefan Rogin [ 16/Jun/16 ]

repair fails :

2016-06-16T16:15:21.625+0300 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.verify 56200
2016-06-16T16:15:21.630+0300 E STORAGE  [initandlisten] WiredTiger (0) [1466082921:630660][8347:0x7f006e3b6cc0], file:collection-128-6816245440313053.wt, WT_SESSION.verify: read checksum error for 12288B block at offset 288509952: block header checksum of 2294773803 doesn't match expected checksum of 1228719121
2016-06-16T16:15:21.680+0300 E STORAGE  [initandlisten] WiredTiger (0) [1466082921:680635][8347:0x7f006e3b6cc0], file:collection-128-6816245440313053.wt, WT_SESSION.verify: checkpoint ranges never verified: 3657
2016-06-16T16:15:21.697+0300 E STORAGE  [initandlisten] WiredTiger (0) [1466082921:697338][8347:0x7f006e3b6cc0], file:collection-128-6816245440313053.wt, WT_SESSION.verify: file ranges never verified: 3657
2016-06-16T16:15:21.697+0300 I STORAGE  [initandlisten] Verify failed on uri table:collection-128-6816245440313053. Running a salvage operation.
2016-06-16T16:15:21.701+0300 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.salvage 100
...
2016-06-16T16:15:27.541+0300 I STORAGE  [initandlisten] WiredTiger progress WT_SESSION.salvage 96200
2016-06-16T16:15:27.673+0300 I INDEX    [initandlisten] build index on: db.collection properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "db.collection" }
2016-06-16T16:15:27.673+0300 I INDEX    [initandlisten]          building index using bulk method
2016-06-16T16:15:53.433+0300 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 190
2016-06-16T16:15:53.434+0300 I -        [initandlisten] 
2016-06-16T16:15:53.462+0300 F -        [initandlisten] Got signal: 6 (Aborted).

I've restarted and got the same failure (after the repair went through the data again)

2016-06-16T17:18:48.833+0300 I STORAGE  [initandlisten] Verify succeeded on uri table:collection-128-6816245440313053. Not salvaging.
2016-06-16T17:18:48.847+0300 I INDEX    [initandlisten] build index on: db.collection properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "db.collection" }
2016-06-16T17:18:48.847+0300 I INDEX    [initandlisten]          building index using bulk method
2016-06-16T17:19:15.146+0300 I -        [initandlisten] Invariant failure rs.get() src/mongo/db/catalog/database.cpp 190
2016-06-16T17:19:15.146+0300 I -        [initandlisten] 
 
***aborting after invariant() failure
 
 
2016-06-16T17:19:15.189+0300 F -        [initandlisten] Got signal: 6 (Aborted).

mongod used for repair:

$ mongod --version
db version v3.2.7
git version: 4249c1d2b5999ebbf1fdf3bc0e0e3b3ff5c0aaf2
OpenSSL version: OpenSSL 1.0.1f 6 Jan 2014
allocator: tcmalloc
modules: none
build environment:
    distmod: ubuntu1404
    distarch: x86_64
    target_arch: x86_64

Comment by Stefan Rogin [ 16/Jun/16 ]

Thanks, it works.
get new error

2016-06-16T15:22:34.714+0300 W -        [initandlisten] Detected unclean shutdown - /media/conrad/ssd2/mongodb_20160407/mongodb/mongod.lock is not empty.
2016-06-16T15:22:34.714+0300 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2016-06-16T15:22:34.714+0300 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=18G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),,log=(enabled=false),
2016-06-16T15:22:35.062+0300 E STORAGE  [initandlisten] WiredTiger (-31802) [1466079755:62667][8290:0x7fd1cb45acc0], file:collection-0--9014084984712492362.wt, WT_SESSION.open_cursor: collection-0--9014084984712492362.wt: handle-read: pread: failed to read 12288 bytes at offset 4954447872: WT_ERROR: non-specific WiredTiger error
2016-06-16T15:22:35.062+0300 I -        [initandlisten] Invariant failure: ret resulted in status UnknownError: -31802: WT_ERROR: non-specific WiredTiger error at src/mongo/db/storage/wiredtiger/wiredtiger_session_cache.cpp 74
2016-06-16T15:22:35.074+0300 I CONTROL  [initandlisten] 
2016-06-16T15:22:35.074+0300 I -        [initandlisten] 

But the --repair started working, yey!

Comment by Ramon Fernandez Marina [ 16/Jun/16 ]

Sorry you are running into these issues stefan.rogin. I've attached a repair_attempt.tgz file to this ticket; please extract it on your dbpath and try to start the affected node.

Comment by Stefan Rogin [ 16/Jun/16 ]

added other files that might be useful

Comment by Stefan Rogin [ 16/Jun/16 ]

Any advice on how to fix this, saw something about wt tools, could they help ? It's affecting a product release so any clues are welcomed, no matter how technical or complicated.

Generated at Thu Feb 08 04:06:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.