[SERVER-44271] [Data recovery] WiredTiger.turtle: fatal turtle file read error: WT_NOTFOUND: item not found Created: 26/Oct/19  Updated: 20/Dec/19  Resolved: 20/Dec/19

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 4.0.13
Fix Version/s: None

Type: Question Priority: Major - P3
Reporter: Marcin W Assignee: Carl Champain (Inactive)
Resolution: Incomplete Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Files structure.txt     Text File fail_linux.txt     Text File fail_windows.txt    
Participants:

 Description   

Hi,

I had a hard drive failure, managed to recover files, but it seems like there's still an issue with them. I'm getting an error:

WiredTiger.turtle: fatal turtle file read error: WT_NOTFOUND: item not found

2019-10-26T23:20:28.477+0200 I CONTROL  [main] Trying to start Windows service 'MongoDB'
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] MongoDB starting : pid=14184 port=27017 dbpath=H:\www\TSMADb\Mongo\4.0\data 64-bit host=DESKTOP-G0V5LV5
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] db version v4.0.13
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] git version: bda366f0b0e432ca143bc41da54d8732bd8d03c0
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] allocator: tcmalloc
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] modules: none
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] build environment:
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten]     distmod: 2008plus-ssl
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten]     distarch: x86_64
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten]     target_arch: x86_64
2019-10-26T23:20:28.478+0200 I CONTROL  [initandlisten] options: { config: "C:\Program Files\MongoDB\Server\4.0\bin\mongod.cfg", net: { bindIp: "127.0.0.1", port: 27017 }, service: true, storage: { dbPath: "H:\www\TSMADb\Mongo\4.0\data", journal: { enabled: true } }, systemLog: { destination: "file", logAppend: true, path: "H:\www\TSMADb\Mongo\4.0\log\mongod.log" } }
2019-10-26T23:20:28.480+0200 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=15849M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2019-10-26T23:20:28.521+0200 I STORAGE  [initandlisten] WiredTiger message [1572124828:521681][14184:140723733749328], txn-recover: Set global recovery timestamp: 0
2019-10-26T23:20:28.628+0200 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(0, 0)
2019-10-26T23:20:28.722+0200 I CONTROL  [initandlisten] 
2019-10-26T23:20:28.722+0200 I CONTROL  [initandlisten] ** WARNING: Access control is not enabled for the database.
2019-10-26T23:20:28.722+0200 I CONTROL  [initandlisten] **          Read and write access to data and configuration is unrestricted.
2019-10-26T23:20:28.722+0200 I CONTROL  [initandlisten] 
2019-10-26T23:20:28.722+0200 I STORAGE  [initandlisten] createCollection: admin.system.version with provided UUID: f2495c23-fd0f-4feb-b6ef-a93ae2ae5caa
2019-10-26T23:20:28.731+0200 I COMMAND  [initandlisten] setting featureCompatibilityVersion to 4.0
2019-10-26T23:20:28.733+0200 I STORAGE  [initandlisten] createCollection: local.startup_log with generated UUID: 009af9f6-e0a3-47d3-972c-b2b0f3b45c4e
2019-10-26T23:20:29.013+0200 W FTDC     [initandlisten] Failed to initialize Performance Counters for FTDC: WindowsPdhError: PdhExpandCounterPathW failed with 'The specified object was not found on the computer.' for counter '\Memory\Available Bytes'
2019-10-26T23:20:29.013+0200 I FTDC     [initandlisten] Initializing full-time diagnostic data capture with directory 'H:/www/TSMADb/Mongo/4.0/data/diagnostic.data'
2019-10-26T23:20:29.015+0200 I CONTROL  [LogicalSessionCacheReap] Sessions collection is not set up; waiting until next sessions reap interval: config.system.sessions does not exist
2019-10-26T23:20:29.015+0200 I STORAGE  [LogicalSessionCacheRefresh] createCollection: config.system.sessions with generated UUID: 363c117a-7de9-4a5a-a8f1-962cdb9b8691
2019-10-26T23:20:29.015+0200 I NETWORK  [initandlisten] waiting for connections on port 27017
2019-10-26T23:20:29.015+0200 I STORAGE  [initandlisten] Service running
2019-10-26T23:20:29.026+0200 I INDEX    [LogicalSessionCacheRefresh] build index on: config.system.sessions properties: { v: 2, key: { lastUse: 1 }, name: "lsidTTLIndex", ns: "config.system.sessions", expireAfterSeconds: 1800 }
2019-10-26T23:20:29.026+0200 I INDEX    [LogicalSessionCacheRefresh] 	 building index using bulk method; build may temporarily use up to 500 megabytes of RAM
2019-10-26T23:20:29.027+0200 I INDEX    [LogicalSessionCacheRefresh] build index done.  scanned 0 total records. 0 secs
2019-10-26T23:21:16.607+0200 I NETWORK  [listener] connection accepted from 127.0.0.1:56441 #1 (1 connection now open)
2019-10-26T23:21:16.609+0200 I NETWORK  [conn1] received client metadata from 127.0.0.1:56441 conn1: { driver: { name: "nodejs-core", version: "3.2.7" }, os: { type: "Windows_NT", name: "win32", architecture: "x64", version: "10.0.18362" }, platform: "Node.js v10.2.0, LE", application: { name: "MongoDB Compass Community" } }
2019-10-26T23:21:16.648+0200 I NETWORK  [listener] connection accepted from 127.0.0.1:56442 #2 (2 connections now open)
2019-10-26T23:21:16.658+0200 I NETWORK  [conn2] received client metadata from 127.0.0.1:56442 conn2: { driver: { name: "nodejs-core", version: "3.2.7" }, os: { type: "Windows_NT", name: "win32", architecture: "x64", version: "10.0.18362" }, platform: "Node.js v10.2.0, LE", application: { name: "MongoDB Compass Community" } }
2019-10-26T23:21:16.660+0200 I NETWORK  [listener] connection accepted from 127.0.0.1:56443 #3 (3 connections now open)
2019-10-26T23:21:16.662+0200 I NETWORK  [conn3] received client metadata from 127.0.0.1:56443 conn3: { driver: { name: "nodejs-core", version: "3.2.7" }, os: { type: "Windows_NT", name: "win32", architecture: "x64", version: "10.0.18362" }, platform: "Node.js v10.2.0, LE", application: { name: "MongoDB Compass Community" } }
2019-10-26T23:21:24.291+0200 I NETWORK  [listener] connection accepted from 127.0.0.1:56444 #4 (4 connections now open)
2019-10-26T23:21:24.293+0200 I NETWORK  [conn4] received client metadata from 127.0.0.1:56444 conn4: { driver: { name: "nodejs-core", version: "3.2.7" }, os: { type: "Windows_NT", name: "win32", architecture: "x64", version: "10.0.18362" }, platform: "Node.js v10.2.0, LE", application: { name: "MongoDB Compass Community" } }
2019-10-26T23:21:27.881+0200 I NETWORK  [conn2] end connection 127.0.0.1:56442 (3 connections now open)
2019-10-26T23:21:27.881+0200 I NETWORK  [conn1] end connection 127.0.0.1:56441 (2 connections now open)
2019-10-26T23:21:27.881+0200 I NETWORK  [conn3] end connection 127.0.0.1:56443 (1 connection now open)
2019-10-26T23:21:27.881+0200 I NETWORK  [conn4] end connection 127.0.0.1:56444 (0 connections now open)
2019-10-26T23:21:28.743+0200 E STORAGE  [WTCheckpointThread] WiredTiger error (-31803) [1572124888:743655][14184:140723733749328], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_turtle_read, 341: WiredTiger.turtle: fatal turtle file read error: WT_NOTFOUND: item not found Raw: [1572124888:743655][14184:140723733749328], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_turtle_read, 341: WiredTiger.turtle: fatal turtle file read error: WT_NOTFOUND: item not found
2019-10-26T23:21:28.743+0200 E STORAGE  [WTCheckpointThread] WiredTiger error (-31804) [1572124888:743655][14184:140723733749328], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1572124888:743655][14184:140723733749328], file:WiredTiger.wt, WT_SESSION.checkpoint: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-10-26T23:21:28.743+0200 F -        [WTCheckpointThread] Fatal Assertion 50853 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 409
2019-10-26T23:21:28.743+0200 F -        [WTCheckpointThread] 

In total my DB files are 40MB, so I can upload them if needed, but not sure if it'll help, just in case I've attached tree of files that R-Studio managed to recover.

Please, help me... how can I recover my database? At least some of it... it's impossible to recover this data by other means, so I would really appropriate help



 Comments   
Comment by Carl Champain (Inactive) [ 20/Dec/19 ]

Hi marcin_wo_wroc@o2.pl,

We haven’t heard back from you for some time, so I’m going to mark this ticket as resolved. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Thank you,
Carl

Comment by Carl Champain (Inactive) [ 04/Dec/19 ]

Hi marcin_wo_wroc@o2.pl,

Any updates on this issue?

Comment by Louis Williams [ 08/Nov/19 ]

The --repair orphan recovery process attemtps to locate .wt data files on disk that are in the MongoDB collection catalog, but are missing from the WT metadata. This implies that corrupt WT metadata lost knowledge of a data file. We can look for the data file because a higher-level of MongoDB knows the data file might exist, hence the "orphan" terminology.

The error message "Orphan file recovery is not supported on Windows" was added because the procedure involves moving files while the directory is open by the process, and Windows filesystem permissions made this more effort than we were willing to put in  for a best-effort data recovery procedure.

Regarding the error on Linux, I found this blog post on Google that suggests the fdatasync issue is a result of mounting a Windows filesystem through your Ubuntu VM.

marcin_wo_wroc@o2.pl did you make a backup of your database before running --repair? The log message "Collection TSMA.GeoLocation dropped: Orphan file recovery is not supported on Windows" indicates that MongoDB deleted the metadata for that collection, but not the data file. However, running --repair again will still not be able to recover the orphan because the metadata has been lost. Please report back if you can re-run repair on the original, corrupt data files on a non-shared data directory on a Linux filesystem.

Comment by Marcin W [ 06/Nov/19 ]

Somehow I cannot make repair feature work on linux.

I have MongoDB version 4.0.13 on both: Windows and Linux (Ubuntu)

I've setup configs of both to run with data in the same directory. After that I cleaned up data (removed current, unpacked my damaged database into the directory), and run the repair.

See }}{{fail_windows.txt

After that I did the clean up, logged into Ubuntu and run repair on Mongo out there.

See }}{{fail_linux.txt{{}}

possibly the key part:

 

2019-11-06T22:51:00.747+0100 W STORAGE  [initandlisten] Failed to start up WiredTiger under any compatibility version.
2019-11-06T22:51:00.747+0100 F STORAGE  [initandlisten] Reason: -31804: WT_PANIC: WiredTiger library panic
2019-11-06T22:51:00.747+0100 W STORAGE  [initandlisten] Attempting to salvage WiredTiger metadata
2019-11-06T22:51:00.750+0100 E STORAGE  [initandlisten] WiredTiger error (22) [1573077060:750108][48:0x7fedfc940a40], connection: __posix_sync, 99: /mnt/h/www/TSMADb/Mongo/4.0/data/: directory-sync: fdatasync: Invalid argument Raw: [1573077060:750108][48:0x7fedfc940a40], connection: __posix_sync, 99: /mnt/h/www/TSMADb/Mongo/4.0/data/: directory-sync: fdatasync: Invalid argument
2019-11-06T22:51:00.750+0100 E STORAGE  [initandlisten] WiredTiger error (-31804) [1573077060:750144][48:0x7fedfc940a40], connection: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1573077060:750144][48:0x7fedfc940a40], connection: __wt_panic, 494: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-11-06T22:51:00.771+0100 E STORAGE  [initandlisten] WiredTiger error (-31804) [1573077060:771970][48:0x7fedfc940a40], connection: __posix_directory_sync, 151: /mnt/h/www/TSMADb/Mongo/4.0/data/WiredTiger.turtle.set: directory-sync: WT_PANIC: WiredTiger library panic Raw: [1573077060:771970][48:0x7fedfc940a40], connection: __posix_directory_sync, 151: /mnt/h/www/TSMADb/Mongo/4.0/data/WiredTiger.turtle.set: directory-sync: WT_PANIC: WiredTiger library panic
2019-11-06T22:51:00.772+0100 E STORAGE  [initandlisten] WiredTiger error (-31804) [1573077060:772006][48:0x7fedfc940a40], connection: __wt_turtle_update, 397: WiredTiger.turtle: fatal turtle file update error: WT_PANIC: WiredTiger library panic Raw: [1573077060:772006][48:0x7fedfc940a40], connection: __wt_turtle_update, 397: WiredTiger.turtle: fatal turtle file update error: WT_PANIC: WiredTiger library panic
2019-11-06T22:51:00.772+0100 F STORAGE  [initandlisten] Failed to salvage WiredTiger metadata: -31804: WT_PANIC: WiredTiger library panic

@Carl - "Are you using a replica set?" - no, I am not.

 

Comment by Marcin W [ 01/Nov/19 ]

Just a quick update: sadly the linux version I have is 4.2, while my files are from 4.0, so it doesn't even detect collections correctly. I won't be able to get back to the topic till after the weekend, but please, don't close the ticket yet. Thank you.

 

Comment by Marcin W [ 28/Oct/19 ]

Oh, thanks! I totally forgot about the repair option! It can see all my collections in the log, which is superb news, but it throws an "Orphan file recovery is not supported on Windows"

STORAGE [initandlisten] Storage engine is missing collection 'db.collection' from its metadata. Attempting to locate and recover the data for collection-12-9158706690432070880
STORAGE [initandlisten] Failed to recover orphaned data file for collection 'db.collection': CommandNotSupported: Orphan file recovery is not supported on Windows

I'll try to get ubuntu up and running and attempt to restore it from there. Will report back ASAP (likely tomorrow).

Comment by Carl Champain (Inactive) [ 28/Oct/19 ]

Hi marcin_wo_wroc@o2.pl,

Thanks for the report.
Please make a complete copy of the database's $dbpath directory and safeguard that copy.

A couple of questions:
1. Can you first try running --repair? If the repair attempt fails, please provide us the logs.
2. Are you using a replica set? If so, try to perform a clean resync from an unaffected node and it will resolve your issue.
 
Kind regards,
Carl

Generated at Thu Feb 08 05:05:30 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.