[SERVER-28820] Recovery failed: WT_NOTFOUND: item not found Created: 17/Apr/17  Updated: 06/Aug/18  Resolved: 10/Jul/17

Status: Closed
Project: Core Server
Component/s: Storage, WiredTiger
Affects Version/s: 3.2.4
Fix Version/s: 3.5.9

Type: Bug Priority: Major - P3
Reporter: Andrew Assignee: Susan LoVerso
Resolution: Done Votes: 0
Labels: wtc
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

Linux, Ubuntu 14.04 64bit


Attachments: File WiredTiger.turtle     File WiredTiger.wt     PNG File debug_error1.png     JPEG File error.jpeg     JPEG File ls.jpeg     File repair-SERVER-28820.tar.gz     Text File sizeStorer.hd.txt    
Backwards Compatibility: Fully Compatible
Operating System: Linux
Steps To Reproduce:

run mongod

Participants:

 Description   

My server broke down and all I had is copy of database files in /data/db. No proper mongodump, only files.
After restoring my VPS from snapshot, I tried to run mongo, get error (see in attachment)
Answering to your question:
1. Which version of MongoDB was the original mongod using?
Original was 3.2.4. However, I already updated to 3.4.3 with hope that this will be fixed.
2. What method do you use to create backups?
That is the problem, that I didnt have mongodumps. Only *.wt files.
3. Have you ever manipulated (copied or moved) the underlying database files? If so, was the mongod running?
No I didnt touch files. After restoring from snapshot, server state was same as before server fault.
Please find all screenshots and WiredTiger.wt/ WiredTiger.turtle files.
Is it any chance to restore from unclean shutdown only with *.wt files.?



 Comments   
Comment by Andrew [ 11/Jul/17 ]

Hi, Ramon, previously in my message:
"Alexander, please find database files copy. Let me know when I can deactivate link. https://www.dropbox.com/s/lmoc9jihstnwzv2/database.tar.gz?dl=0"
I provide to Alexander the database files copy to investigate the issue.

So my question is do you (or Alexander) still have these files? I need them urgently.

Comment by Ramon Fernandez Marina [ 10/Jul/17 ]

kasperpro, I believe we only have whatever files are attached to this ticket, are those the files you're inquiring about?

Thanks,
Ramón.

Comment by Andrew [ 10/Jul/17 ]

Good afternoon,
Do you still have my database files with you?

Comment by Githook User [ 06/Jun/17 ]

Author:

{u'username': u'agorrod', u'name': u'Alex Gorrod', u'email': u'alexander.gorrod@mongodb.com'}

Message: Import wiredtiger: 7aaeaaa054d1ac27a95c79984f7ca69ba739caae from branch mongodb-3.6

ref: 78109ca3fe..7aaeaaa054
for: 3.5.9

SERVER-28820 Recovery failed: WT_NOTFOUND: item not found
SERVER-28835 Fix a memory leak in WiredTiger on error when creating thread group
WT-2972 Add interface allowing partial updates to existing values
WT-3041 Failure of test_perf01 on PPC
WT-3063 Reserve records for read-modify-write
WT-3076 Add a general-purpose epoch manager
WT-3123 Thread group holding lock across thread join
WT-3142 Add a workload generator application
WT-3158 Fix structure layout on Windows.
WT-3160 Improve eviction of internal pages from idle trees
WT-3197 aarch64 CRC32C support fails to compile on non-linux ARM platforms
WT-3219 Make the clang-analyzer job fail when lint is introduced
WT-3222 Review and enhance log statistics
WT-3245 Avoid hangs on shutdown when a utility thread encounters an error
WT-3247 Test should exit instead of abort to avoid a core dump
WT-3248 Performance degradation in workload with large overflow items
WT-3253 txn07 test problem
WT-3258 Improve visibility into thread wait time due to pages exceeding memory_page_max
WT-3261 add a checkpoint epoch to avoid draining the eviction queue
WT-3262 Schema operations shouldn't wait for cache
WT-3263 Allow archive on restart/recovery if clean shutdown
WT-3264 Permanent change to disable logging should eventually remove all logs
WT-3265 Verify hits assertion in eviction when transiting handle to exclusive mode
WT-3266 Thread group deadlock
WT-3267 Upgrade copyright notices from 2016 to 2017.
WT-3268 Failure to close cursor can get wiredtiger stuck in a cursor-close loop
WT-3269 Miscellaneous cleanup changes
WT-3271 Eviction tuning stuck in a loop
WT-3275 stress test sanitizer failure
WT-3278 log the row-store cursor key instead of page key
WT-3281 stress test sanitizer failure
WT-3282 Stuck in conn cache pool destroy join
WT-3284 tree-walk restart bug
WT-3287 review WiredTiger internal panic checks
WT-3288 fix error codes for event_handler to be consistent in file operations
WT-3292 review/cleanup full-barrier calls in WiredTiger
WT-3293 Make internal symbols externally visible
WT-3296 LAS table fixes/improvements
WT-3297 support the gcc/clang -fvisibility=hidden flag
WT-3300 Coverity 1374542: Dereference after null check
WT-3302 Failure to create cache pool manager thread results in crash when destroying cache pool
WT-3303 Deadlock during first access to lookaside table
WT-3307 FI testing: segfault in python test test_bug013 when fault introduced reading turtle file
WT-3312 encryption btree configuration test
WT-3313 Replace calls to the deprecated LZ4_compress function
WT-3314 clarify error handling
WT-3327 Checkpoints can hang if time runs backward
WT-3331 Test format aborted due to time rollback
WT-3333 Make it possible to store 0 bytes into a 'u' format via Python
WT-3334 static test suite's BaseDataSet class has 'u' value format bugs
WT-3339 The CURSOR_UPDATE_API_CALL macro will dump core on a NULL btree handle
WT-3342 Create a new WiredTiger 2.9.2 release
WT-3343 WiredTiger database close can attempt unlock of a lock that's not held.
WT-3345 Improve rwlock scaling
WT-3348 Lint, Windows warnings.
WT-3351 Recovery assertion failure: old_lognum < lognum
WT-3354 Coverity issues 1375904-1375907
WT-3356 rwlock assertion failure on PPC
Branch: master
https://github.com/mongodb/mongo/commit/60341ff5b540ed35c8378910d92fe6c128f398e6

Comment by Githook User [ 25/Apr/17 ]

Author:

{u'username': u'sueloverso', u'name': u'sueloverso', u'email': u'sue@mongodb.com'}

Message: SERVER-28820 Add a few error path messages in logging. (#3402)
Branch: develop
https://github.com/wiredtiger/wiredtiger/commit/3f02e205906c487376a04cd936888398913161c4

Comment by Andrew [ 22/Apr/17 ]

Hello Sue LoVerso,
I made steps that you have mentioned. I was able to get mongo running!
Regarding journal files, unfortunately I have copy of April 4th backup (the one that I have sent to you was April 7). If it will help to investigate, I can sent to you journal files for April 4th.

Anyway, thank you so much! Now I am able to make mongodump and recover my data.

Comment by Susan LoVerso [ 20/Apr/17 ]

Hello kasperpro. There is definitely a problem with the journal files. As my earlier comment implied, they're zeroed out or removed. Please investigate what may have happened there. I was able to get a mongod up and running on your data with the following steps:

  • Move aside the journal directory to journal.old or whatever name you want.
  • Run mongod with --repair --nojournal. Several of the tables need to be repaired.
  • After that completes you should be able to restart mongod with --nojournal to access your data.
  • Once your journal investigation is done, you can restart mongod with the journal for greater durability (and remove the old directory).
Comment by Susan LoVerso [ 20/Apr/17 ]

The good news is that with your tarball, I can reproduce the error and investigate where exactly it is coming from. Thank you for uploading the information.

Comment by Susan LoVerso [ 20/Apr/17 ]

Hello kasperpro, I have downloaded the tarball, you can deactivate the link. I will remove all files when this ticket is complete. Can you tell me about your journal directory? The journal files in the tarball are effectively empty. The first log file contains a few log records that are for system, internal information. All the rest of the log files are zeroed (WiredTiger will create a new log file on each restart attempt). The two error message screen shots show one restart with journal enabled and one without it. Did you switch back and forth with journal on/off? Is there any possibility that the journal directory is on a different file system and that directory could be shared with another mongod process?

Comment by Andrew [ 20/Apr/17 ]

Alexander, please find database files copy.
Let me know when I can deactivate link.

Comment by Alexander Gorrod [ 18/Apr/17 ]

In order to understand better what is happening we would need a tarball of your dbdir, including the journal files. Would you be willing to provide that? If so can you give us some indication about how much data is involved?

Comment by Andrew [ 18/Apr/17 ]

Still getting error, please see in attachment - debug_error1.
The only one difference is that now throwing exception in different line: wiredtiger_kv_engine.cpp 26

Comment by Mark Agarunov [ 17/Apr/17 ]

Hello kasperpro

Thank you for the report. I've attached a repair attempt of the files you've provided. Would you please extract these files and replace them in your $dbpath and let us know if it resolves the issue?

Thanks,
Mark

Comment by Andrew [ 17/Apr/17 ]

BTW related with ticket SERVER-24435

Generated at Thu Feb 08 04:19:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.