[SERVER-40088] WiredTiger has failed to open its metadata Created: 12/Mar/19  Updated: 12/Mar/19  Resolved: 12/Mar/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.2.10
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Christian Cremer Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     File repair-attempt.tar    
Operating System: ALL
Steps To Reproduce:

2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] MongoDB starting : pid=27 port=27017 dbpath=/var/lib/mongodb/data 64-bit host=frame-4-bcfcl
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] db version v3.2.10
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] git version: 79d9b3ab5ce20f51c272b4411202710a082d0317
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.1e-fips 11 Feb 2013
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] allocator: tcmalloc
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] modules: none
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] build environment:
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] distarch: x86_64
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] target_arch: x86_64
2019-03-12T11:58:57.249+0000 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net:

{ bindIp: "127.0.0.1", port: 27017 }

, replication: { oplogSizeMB: 64 }, security:

{ authorization: "disabled" }

, storage: { dbPath: "/var/lib/mongodb/data", wiredTiger: { engineConfig:

{ cacheSizeGB: 1 }

} }, systemLog:

{ quiet: true }

}
2019-03-12T11:58:57.262+0000 I - [initandlisten] Detected data files in /var/lib/mongodb/data created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2019-03-12T11:58:57.271+0000 W - [initandlisten] Detected unclean shutdown - /var/lib/mongodb/data/mongod.lock is not empty.
2019-03-12T11:58:57.280+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2019-03-12T11:58:57.283+0000 I STORAGE [initandlisten] wiredtiger_open config: create,cache_size=1G,session_max=20000,eviction=(threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),checkpoint=(wait=60,log_size=2GB),statistics_log=(wait=0),
2019-03-12T11:58:57.493+0000 E STORAGE [initandlisten] WiredTiger (-31802) [1552391937:493089][27:0x7fc04b147e80], file:WiredTiger.wt, connection: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2019-03-12T11:58:57.493+0000 E STORAGE [initandlisten] WiredTiger (0) [1552391937:493208][27:0x7fc04b147e80], file:WiredTiger.wt, connection: WiredTiger has failed to open its metadata
2019-03-12T11:58:57.493+0000 E STORAGE [initandlisten] WiredTiger (0) [1552391937:493245][27:0x7fc04b147e80], file:WiredTiger.wt, connection: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2019-03-12T11:58:57.493+0000 E STORAGE [initandlisten] WiredTiger (0) [1552391937:493262][27:0x7fc04b147e80], file:WiredTiger.wt, connection: You should confirm that you have opened the database with the correct options including all encryption and compression options
2019-03-12T11:58:57.513+0000 I - [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error
2019-03-12T11:58:57.513+0000 I STORAGE [initandlisten] exception in initAndListen: 28595 -31802: WT_ERROR: non-specific WiredTiger error, terminating
2019-03-12T11:58:57.514+0000 I CONTROL [initandlisten] dbexit: rc: 100

Participants:

 Description   

We run MongoDB 3.2 in an OpenShift cluster using the official RedHat pod (hence the not quite up-to-date version). For reasons unknown yet, the pod failed to stop cleanly, corrupting the WiredTiger data files.

The error is basically the same as described in SERVER-23346, SERVER-27777, SERVER-25770, SERVER-28242 and possibly others. However, none of the tickets we found includes instructions on how to fix the problem ourselves.

We have tried to run an image with 3.6, that did not help. When running a 4.0 image, the output is more and it tries to repair, but only to fail with "** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.6 before attempting an upgrade to 4.0".

A file-based restore does not help either, mongodb fails to start with checksum errors. Our customer did not create dumps.

Can you help us? The database is currently down...

Reading from the other logs, I have attached some files in order to get repair files. Please indicate if those are enough.

Best Regards

Christian Cremer



 Comments   
Comment by Danny Hatcher (Inactive) [ 12/Mar/19 ]

Chris,

Thanks for letting us know the repair worked!

Have a great day,

Danny

Comment by Christian Cremer [ 12/Mar/19 ]

Hello Daniel

Thank you very much for the superfast response and repair files! The repair worked and the server is now up again. We're waiting for the customer to verify the data, but I don't expect any more troubles in this regard.

Yes, that or running databases outside of Openshift altogether. We have had DB corruptions with PostgreSQL and other systems when used with GlusterFS in the past before. However, since we are running Openshift as a public cluster, we don't always know what our customers are running.

Nevertheless, thanks again!

Cheers, Chris

Comment by Danny Hatcher (Inactive) [ 12/Mar/19 ]

Hello Christian,

We recommend using replica sets for situations like this. If one node goes down due to corrupt data, you can perform a clean sync from a healthy node without a service interruption.

I've attached repair-attempt.tar to this ticket. Can you replace the files under the $dbpath with these and let me know if the node is able to start up?

Thank you,

Danny

Generated at Thu Feb 08 04:53:59 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.