[SERVER-61936] WiredTiger metadata corruption detected - unable to repair Created: 07/Dec/21  Updated: 14/Feb/22  Resolved: 14/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 3.6.18
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Sarojini Jillalla Assignee: Edwin Zhou
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File WiredTiger.turtle     File WiredTiger.wt     File WiredTigerLAS.wt     PNG File image003.png    
Issue Links:
Duplicate
is duplicated by SERVER-61937 WiredTiger fails to recover data when... Closed
Operating System: ALL
Steps To Reproduce:

Improper shutdown of the service causes data corruption.

Participants:

 Description   

We are using Graylog, Elasticsearch and MongoDB for logging and archiving. These apps are run as docker containers with 3 replicas on 3 RHEL servers. We are using MongoDB version 3.6.18
Generally, the docker containers goes down and they are automatically brought up by the docker daemon. But sometimes, the shutdown is not proper and the data in MongoDB gets corrupted. Till now, we used to perform a repair and the data was able to be recovered successfully.

Today, the repair is failing to recover the data. As such the MongoDB node is not able to be started properly, thereby causing the Graylog server to be down.

This is the output from the repair operation.

My title

[root@dcvsl125 mongodb]# docker run -it -v /docker/services/mongodb/db01:/data/db mongo:3.6.18 mongod --repair
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] MongoDB starting : pid=1 port=27017 dbpath=/data/db 64-bit host=c4feb442a438
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] db version v3.6.18
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] git version: 2005f25eed7ed88fa698d9b800fe536bb0410ba4
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] OpenSSL version: OpenSSL 1.0.2g 1 Mar 2016
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] allocator: tcmalloc
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] modules: none
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] build environment:
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] distmod: ubuntu1604
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] distarch: x86_64
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] target_arch: x86_64
2021-12-07T01:49:59.777+0000 I CONTROL [initandlisten] options: { net:

Unknown macro: { bindIpAll}

, repair: true }
2021-12-07T01:49:59.778+0000 W - [initandlisten] Detected unclean shutdown - /data/db/mongod.lock is not empty.
2021-12-07T01:49:59.778+0000 I - [initandlisten] Detected data files in /data/db created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2021-12-07T01:49:59.779+0000 W STORAGE [initandlisten] Recovering data from the last clean checkpoint.
2021-12-07T01:49:59.779+0000 I STORAGE [initandlisten] Detected WT journal files. Running recovery from last checkpoint.
2021-12-07T01:49:59.779+0000 I STORAGE [initandlisten] journal to nojournal transition config: create,cache_size=63873M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2021-12-07T01:50:00.410+0000 E STORAGE [initandlisten] WiredTiger error (-31802) [1638841800:410928][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 604: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error Raw: [1638841800:410928][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 604: unable to read root page from file:WiredTiger.wt: WT_ERROR: non-specific WiredTiger error
2021-12-07T01:50:00.411+0000 E STORAGE [initandlisten] WiredTiger error (0) [1638841800:411036][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 611: WiredTiger has failed to open its metadata Raw: [1638841800:411036][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 611: WiredTiger has failed to open its metadata
2021-12-07T01:50:00.411+0000 E STORAGE [initandlisten] WiredTiger error (0) [1638841800:411063][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 614: This may be due to the database files being encrypted, being from an older version or due to corruption on disk Raw: [1638841800:411063][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 614: This may be due to the database files being encrypted, being from an older version or due to corruption on disk
2021-12-07T01:50:00.411+0000 E STORAGE [initandlisten] WiredTiger error (0) [1638841800:411082][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 617: You should confirm that you have opened the database with the correct options including all encryption and compression options Raw: [1638841800:411082][1:0x7f27e7e59a40], file:WiredTiger.wt, connection: __wt_btree_tree_open, 617: You should confirm that you have opened the database with the correct options including all encryption and compression options
2021-12-07T01:50:00.414+0000 F STORAGE [initandlisten] WiredTiger metadata corruption detected
2021-12-07T01:50:00.414+0000 F STORAGE [initandlisten] This version of MongoDB is unable to repair this kind of corruption, but version 4.0.3+ may be able to repair it. See http://dochub.mongodb.org/core/repair for more information.
2021-12-07T01:50:00.414+0000 F - [initandlisten] Fatal Assertion 50944 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 71
2021-12-07T01:50:00.414+0000 F - [initandlisten]

***aborting after fassert() failure

[root@dcvsl125 mongodb]#

 

Attaching the files from this corrupted instance.
WiredTiger.turtle WiredTiger.wt WiredTigerLAS.wt

 

I have searched here and found the https://jira.mongodb.org/browse/SERVER-40088, which is almost similar to the issue I am facing currently.

Can you please help me in recovering from this failure?

 



 Comments   
Comment by Edwin Zhou [ 14/Feb/22 ]

We haven’t heard back from you for some time, so I’m going to close this ticket. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Edwin Zhou [ 07/Feb/22 ]

Hi saroj.jillalla@cgi.com,

We still need additional information to diagnose the problem. If your issue is still there, would you please provide:

  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Best,
Edwin

Comment by Edwin Zhou [ 28/Jan/22 ]

Hi saroj.jillalla@cgi.com,

If the issue is still there, can you please provide:

  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Best,
Edwin

Comment by Sarojini Jillalla [ 07/Jan/22 ]

Hi Edwin,

The issue is still there and we ae not able to repair the MongoDB.
Please let me know what additional information is required for troubleshooting.

Thanks & Regards
Saroj

Sarojini Jillalla| Sr.Consultant - DevOps|CGI Group Inc.

From: Edwin Zhou (Jira) <jira@mongodb.org>
Sent: Friday, December 31, 2021 2:02 PM
To: Jillalla, Sarojini <saroj.jillalla@cgi.com>
Subject: [MongoDB-JIRA] Edwin Zhou mentioned you on SERVER-61936 (Jira)

EXTERNAL SENDER: Do not click any links or open any attachments unless you trust the sender and know the content is safe.
EXPÉDITEUR EXTERNE: Ne cliquez sur aucun lien et n’ouvrez aucune pièce jointe à moins qu’ils ne proviennent d’un expéditeur fiable, ou que vous ayez l'assurance que le contenu provient d'une source sûre.

https://jira.mongodb.org/s/en_USp8swtz-1988229788/6109/25/_/jira-logo-scaled.png

[cid:image001.png@01D8014E.65693460]

Edwin Zhou<https://urldefense.com/v3/__https:/jira.mongodb.org/secure/ViewProfile.jspa?name=edwin.zhou__;!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-6I7hPTxg$> mentioned you on [Bug] SERVER-61936<https://urldefense.com/v3/__https:/jira.mongodb.org/browse/SERVER-61936__;!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-7RQdHQIQ$>

Re: WiredTiger metadata corruption detected - unable to repair <https://urldefense.com/v3/__https:/jira.mongodb.org/browse/SERVER-61936__;!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-7RQdHQIQ$>

Hi Sarojini Jillalla<https://urldefense.com/v3/__https:/jira.mongodb.org/secure/ViewProfile.jspa?name=saroj.jillalla*40cgi.com__;JQ!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-4Jfq0l1A$>,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please let us know if you've found success in running --repair on the latest version on MongoDB?

Best,
Edwin

[Add Comment]<https://urldefense.com/v3/__https:/jira.mongodb.org/browse/SERVER-61936*add-comment__;Iw!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-5aH5gUcQ$>

Add Comment<https://urldefense.com/v3/__https:/jira.mongodb.org/browse/SERVER-61936*add-comment__;Iw!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-5aH5gUcQ$>

This message was sent from MongoDB's issue tracking system. To respond to this ticket, please login to jira.mongodb.org<https://urldefense.com/v3/__https:/jira.mongodb.org__;!!AaIhyw!7mIrfXeYtFI7_KWIFZr7Nz4gY1xEnzTsxHkOFe1JkNrI5T5_CqaAvI9zQ-6y-64fDw$> using your JIRA, MongoDB Cloud Manager, or MongoDB Atlas credentials.

Comment by Edwin Zhou [ 05/Jan/22 ]

Hi saroj.jillalla@cgi.com,

After backing up your $dbPath, were you able to run --repair using the latest version of MongoDB? MongoDB v4.0.3 and later has improved repair functionality and may be able to resolve your corruption.

If you're able to and you're still unable to launch MongoDB, please provide:

  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Best,
Edwin

Comment by Sarojini Jillalla [ 04/Jan/22 ]

Hi Edwin,

The issue is still there and we are not able to repair the MongoDB.

Please let me know what additional information is required for troubleshooting.

Thanks,
Saroj.

Comment by Edwin Zhou [ 31/Dec/21 ]

Hi saroj.jillalla@cgi.com,

We still need additional information to diagnose the problem. If this is still an issue for you, would you please let us know if you've found success in running --repair on the latest version on MongoDB?

Best,
Edwin

Comment by Edwin Zhou [ 13/Dec/21 ]

Hi saroj.jillalla@cgi.com,

MongoDB 3.6 reached end of life in April of 2021. But we can provide limited guidance on this issue. The ideal resolution is to perform a clean resync from an unaffected node.

First, make a complete copy of the database's $dbpath directory to safeguard so that you can work off of the current $dbpath. Then, try mongod --repair using the latest version of MongoDB.

In the event that a --repair operation is unsuccessful, then please also provide:

  • The logs leading up to the first occurrence of any issue
  • The logs of the repair operation.
  • The logs of any attempt to start mongod after the repair operation completed.

Best,
Edwin

Generated at Thu Feb 08 05:53:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.