[SERVER-37688] unable to read root page from file:WiredTiger.wt: Created: 20/Oct/18  Updated: 02/Mar/20  Resolved: 22/Oct/18

Status: Closed
Project: Core Server
Component/s: WiredTiger
Affects Version/s: 3.6.8
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Axel "Elanis" Soupé [X] Assignee: Kelsey Schubert
Resolution: Done Votes: 0
Labels: wt-repair-success
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
is related to SERVER-39580 [4.0] Skip repairing the FCV document... Closed
Operating System: ALL
Participants:
Case:

 Description   

Hello,

For two days i'm trying to restore my broken database but even my backups are broken ... I tried to --repair my database without success, I saw some other problems were fixed by your team so can you take a look to my logs and files and see if it's recoverable ?

 

That would be amazing, because my latest working backup is from 1st august 2018.

 

Here are the logs from my most recent backup:

C:\Program Files\MongoDB\Server\3.6\bin>mongod.exe --auth
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] MongoDB starting : pid=4844 port=27017 dbpath=C:\data\db\ 64-bit host=DESKTOP-*******
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] db version v3.6.8-85-g9739f4e597
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] git version: 9739f4e5974ef70daac2190acc8db90d6e8311d1
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2o-fips  27 Mar 2018
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2018-10-20T00:38:23.311-0700 I CONTROL  [initandlisten] modules: none
2018-10-20T00:38:23.312-0700 I CONTROL  [initandlisten] build environment:
2018-10-20T00:38:23.312-0700 I CONTROL  [initandlisten]     distmod: 2008plus-ssl
2018-10-20T00:38:23.312-0700 I CONTROL  [initandlisten]     distarch: x86_64
2018-10-20T00:38:23.312-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2018-10-20T00:38:23.312-0700 I CONTROL  [initandlisten] options: { security: { authorization: "enabled" } }
2018-10-20T00:38:23.312-0700 W -        [initandlisten] Detected unclean shutdown - C:\data\db\mongod.lock is not empty.
2018-10-20T00:38:23.312-0700 I -        [initandlisten] Detected data files in C:\data\db\ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-10-20T00:38:23.313-0700 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-10-20T00:38:23.313-0700 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7624M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),cache_cursors=false,compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-10-20T00:38:23.318-0700 E STORAGE  [initandlisten] WiredTiger error (-31802) [1540021103:317632][4844:140726302030928], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2018-10-20T00:38:23.318-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021103:317632][4844:140726302030928], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2018-10-20T00:38:23.318-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021103:317632][4844:140726302030928], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2018-10-20T00:38:23.318-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021103:317632][4844:140726302030928], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2018-10-20T00:38:23.323-0700 E -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src\mongo\db\storage\wiredtiger\wiredtiger_kv_engine.cpp 421
2018-10-20T00:38:23.323-0700 I STORAGE  [initandlisten] exception in initAndListen: Location28595: -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-10-20T00:38:23.323-0700 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-10-20T00:38:23.323-0700 I CONTROL  [initandlisten] now exiting
2018-10-20T00:38:23.323-0700 I CONTROL  [initandlisten] shutting down with code:100

C:\Program Files\MongoDB\Server\3.6\bin>mongod.exe --auth --repair
2018-10-20T00:38:41.349-0700 I CONTROL  [initandlisten] MongoDB starting : pid=12552 port=27017 dbpath=C:\data\db\ 64-bit host=DESKTOP-*******
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] db version v3.6.8-85-g9739f4e597
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] git version: 9739f4e5974ef70daac2190acc8db90d6e8311d1
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2o-fips  27 Mar 2018
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] modules: none
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten] build environment:
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten]     distmod: 2008plus-ssl
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten]     distarch: x86_64
2018-10-20T00:38:41.350-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2018-10-20T00:38:41.351-0700 I CONTROL  [initandlisten] options: { repair: true, security: { authorization: "enabled" } }
2018-10-20T00:38:41.351-0700 W -        [initandlisten] Detected unclean shutdown - C:\data\db\mongod.lock is not empty.
2018-10-20T00:38:41.351-0700 I -        [initandlisten] Detected data files in C:\data\db\ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-10-20T00:38:41.351-0700 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-10-20T00:38:41.351-0700 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2018-10-20T00:38:41.351-0700 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=7624M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),cache_cursors=false,compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-10-20T00:38:41.356-0700 E STORAGE  [initandlisten] WiredTiger error (-31802) [1540021121:356413][12552:140726302030928], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2018-10-20T00:38:41.356-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021121:356413][12552:140726302030928], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2018-10-20T00:38:41.356-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021121:356413][12552:140726302030928], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2018-10-20T00:38:41.356-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021121:356413][12552:140726302030928], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2018-10-20T00:38:41.361-0700 E -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error src\mongo\db\storage\wiredtiger\wiredtiger_kv_engine.cpp 397
2018-10-20T00:38:41.361-0700 I STORAGE  [initandlisten] exception in initAndListen: Location28718: -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-10-20T00:38:41.361-0700 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-10-20T00:38:41.362-0700 I CONTROL  [initandlisten] now exiting
2018-10-20T00:38:41.362-0700 I CONTROL  [initandlisten] shutting down with code:100

C:\Program Files\MongoDB\Server\3.6\bin>mongod.exe --repair
2018-10-20T00:39:05.005-0700 I CONTROL  [initandlisten] MongoDB starting : pid=18108 port=27017 dbpath=C:\data\db\ 64-bit host=DESKTOP-*******
2018-10-20T00:39:05.005-0700 I CONTROL  [initandlisten] targetMinOS: Windows 7/Windows Server 2008 R2
2018-10-20T00:39:05.006-0700 I CONTROL  [initandlisten] db version v3.6.8-85-g9739f4e597
2018-10-20T00:39:05.006-0700 I CONTROL  [initandlisten] git version: 9739f4e5974ef70daac2190acc8db90d6e8311d1
2018-10-20T00:39:05.007-0700 I CONTROL  [initandlisten] OpenSSL version: OpenSSL 1.0.2o-fips  27 Mar 2018
2018-10-20T00:39:05.007-0700 I CONTROL  [initandlisten] allocator: tcmalloc
2018-10-20T00:39:05.007-0700 I CONTROL  [initandlisten] modules: none
2018-10-20T00:39:05.008-0700 I CONTROL  [initandlisten] build environment:
2018-10-20T00:39:05.008-0700 I CONTROL  [initandlisten]     distmod: 2008plus-ssl
2018-10-20T00:39:05.009-0700 I CONTROL  [initandlisten]     distarch: x86_64
2018-10-20T00:39:05.009-0700 I CONTROL  [initandlisten]     target_arch: x86_64
2018-10-20T00:39:05.009-0700 I CONTROL  [initandlisten] options: { repair: true }
2018-10-20T00:39:05.010-0700 W -        [initandlisten] Detected unclean shutdown - C:\data\db\mongod.lock is not empty.
2018-10-20T00:39:05.010-0700 I -        [initandlisten] Detected data files in C:\data\db\ created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2018-10-20T00:39:05.010-0700 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2018-10-20T00:39:05.011-0700 I STORAGE  [initandlisten] Detected WT journal files.  Running recovery from last checkpoint.
2018-10-20T00:39:05.011-0700 I STORAGE  [initandlisten] journal to nojournal transition config: create,cache_size=7624M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),cache_cursors=false,compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2018-10-20T00:39:05.018-0700 E STORAGE  [initandlisten] WiredTiger error (-31802) [1540021145:17966][18108:140726302030928], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2018-10-20T00:39:05.018-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021145:17966][18108:140726302030928], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2018-10-20T00:39:05.019-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021145:18963][18108:140726302030928], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2018-10-20T00:39:05.020-0700 E STORAGE  [initandlisten] WiredTiger error (0) [1540021145:19968][18108:140726302030928], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2018-10-20T00:39:05.025-0700 E -        [initandlisten] Assertion: 28718:-31802: WT_ERROR: non-specific WiredTiger error src\mongo\db\storage\wiredtiger\wiredtiger_kv_engine.cpp 397
2018-10-20T00:39:05.025-0700 I STORAGE  [initandlisten] exception in initAndListen: Location28718: -31802: WT_ERROR: non-specific WiredTiger error, terminating
2018-10-20T00:39:05.025-0700 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2018-10-20T00:39:05.026-0700 I CONTROL  [initandlisten] now exiting
2018-10-20T00:39:05.026-0700 I CONTROL  [initandlisten] shutting down with code:100

I also join WiredTiger.wt & WiredTiger.turtle to this post.

 

Thanks in advance !



 Comments   
Comment by Axel "Elanis" Soupé [X] [ 24/Oct/18 ]

Yep, that was lucky, the best way to avoid that kind of problem is to follow @kelsey.schubert, I'll modify my last comment to say it's not a safe (or a good) way to restore data.

Comment by Ramon Fernandez Marina [ 24/Oct/18 ]

Elanis, this is to let you and anyone else reading this ticket that the procedure you described above is not a safe one, but it seems you "got lucky" and nothing broke.

Comment by Axel "Elanis" Soupé [X] [ 23/Oct/18 ]

Hello,

I launched an old (2 month) working copy of my database with mongodb 3.6, changed fcv to 3.6 (was 3.4), then keep only the last modified files: one collection (the admin database I think) and WiredTiger.wt and two other files I don't remember the name.

Then I pasted my corrupted database into this folder except existing files.

The last step was to launch a 4.0.3 mongod with --repair, and then i was able to launch this backup. I didn't recover full data, maybe because my backups were dirty (I used to compress mongodb /data/db folder, now i'm using mongodump) so some data weren't saved on disk.

I hope it helps to understand how I recovered my data, and I hope this is understandable because English isn't my native language.

Warning: this method isn't safe, and work only if you have luck, if you try to follow it, use only copies of the backup and not the backup itself.

Thank you,

Elanis

Comment by Kelsey Schubert [ 23/Oct/18 ]

Hi Elanis,

When you have some time, would you please walk us through the detailed steps you took to recover/merge the data from these two sources? From your last comment we aren't completely sure what your process was, and we'd like to confirm that there aren't any latent issues around compatibility versioning that may only be exposed after some time.

Thank you,
Kelsey

Comment by Kelsey Schubert [ 22/Oct/18 ]

Hi Elanis,

Thanks for your response. I'm glad to hear that you were able to recover your data. To prevent this type of problem in the future please take note of the following guidelines to help mitigate any issues related to unreliable storage layers or server failures.

Thank you,
Kelsey

Comment by Axel "Elanis" Soupé [X] [ 21/Oct/18 ]

Hello,

I merged a working old backup with recent corrupted backup and I get all the data I needed. I made a dump and I can restore it in my prod database.

 

Thanks for your help and your work on --repair with mongodb 4.0, it saved two month of work.

Comment by Axel "Elanis" Soupé [X] [ 20/Oct/18 ]

Thanks for your help.

I tried to repair with mongodb 4.0.3 but it's not working.

 

** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.6 before attempting an upgrade to 4.0; see http://dochub.mongodb.org/core/4.0-upgrade-fcv for more details.

 

And when I try to re-run on 3.6:

 

2018-10-20T08:15:57.340-0700 E STORAGE  [initandlisten] WiredTiger error (-31802) [1540048557:340396][11764:140726302030928], connection: Version incompatibility detected: unsupported WiredTiger file version: this build requires a maximum version of 2, and the file is version 3: WT_ERROR: non-specific WiredTiger error

Is there a way to pass through fcv verification ? Or use wiredTiger 2 with mongo 4 ? Or another way to fix it ?

 

Thank you in advance

Comment by Kelsey Schubert [ 20/Oct/18 ]

Hi Elanis,

Thanks for the report, we have improved our repair functionality in MongoDB 4.0.3 (SERVER-19815), which should provide a resolve a number of issues that may occur due to a faulty disk layer and increase the chances of success since it acts on more files.

To proceed with the repair attempt, please follow these steps

  1. take a complete copy of your complete dbpath and set it aside
  2. download MongoDB 4.0.3
  3. start MongoDB 4.0.3 with --repair pointing at your data files
  4. after the process completes, you may optionally restart with a MongoDB 3.6.8 binary.

Please let me know if you have any questions about this process or run into any issues.

Thank you,
Kelsey

Generated at Thu Feb 08 04:46:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.