[SERVER-41046] Cannot repair mongodb Created: 08/May/19  Updated: 14/May/19  Resolved: 10/May/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Charlie Chang Assignee: Danny Hatcher (Inactive)
Resolution: Done Votes: 0
Labels: wt-repair-success
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File mongodb error.txt    
Operating System: ALL
Participants:

 Description   

It seems that the db is corrupted (core dump).  I used "--repair" command, but failed.  The detailed error message is attached.  Please help!



 Comments   
Comment by Danny Hatcher (Inactive) [ 10/May/19 ]

Glad to hear it!

Comment by Charlie Chang [ 10/May/19 ]

The trick seems to be working.  Thanks Daniel!

Comment by Danny Hatcher (Inactive) [ 09/May/19 ]

If the repair works on 4.0.9 then you will be able to restart the node as 3.6.0 again. If the repair does not work for whatever reason, unfortunately there is nothing else we can do.

Comment by Charlie Chang [ 09/May/19 ]

I see.  Thanks Daniel.

We are not using cluster for this mongodb.  It's standalone at this point.  It appears that I have to do some homework before migrating to 4.0.9.

For now, I just wanted to confirm that I can restart this db using 3.6.0 IF it can be repaired by 4.0.9, right?  Are there any potential issues with this approach?

Comment by Danny Hatcher (Inactive) [ 09/May/19 ]

Our general recommendation would be to go to 4.0.9 for your cluster. However, there are parts of your application or database schema that may not be compatible with the newest version of MongoDB. We have a comprehensive list of compatibility changes between MongoDB 3.6 and 4.0 in our documentation that you would need to read through before permanently upgrading. You would also need to make sure that the drivers you are using are compatible with 4.0.

Comment by Charlie Chang [ 09/May/19 ]

Thanks Daniel!

Regarding your last statement, you suggested I use version 4.0.9 to repair the corrupted db.  If it works, I should restart the db using version 3.6.0??  My question is, if it works, why can't I just use version 4.0.9 in production since it's more stable?  Is there any db structure change between 3.6.0 and 4.0.9 that prevents me from using 4.0.9?

Comment by Danny Hatcher (Inactive) [ 08/May/19 ]

This error message leads us to suspect some form of physical corruption. Please make a complete copy of the database's $dbpath directory to work off of and safeguard the current $dbpath.

Our ability to determine the source of this corruption depends greatly on your ability to provide:

  1. The logs for the affected node, including before, leading up to, and after the first sign of corruption.
  2. A description of the underlying storage mechanism in use, including details like:
    1. What file system and/or volume management system is in use?
    2. Is data storage locally attached or network-attached?
    3. Are disks RAIDed and if so how?
    4. Are disks SSDs or HDDs?
  3. A description of your backup method, if any.
  4. A description of your disks have been recently checked for integrity?
  5. A history of the deployment, including:
    1. a timeline of version changes
    2. a timeline of hardware upgrade/downgrade cycles or configuration changes
    3. a timeline of disaster recovery or backup restoration activities
    4. a timeline of any manipulations of the underlying database files, including copies or moves, and information about whether mongod was running during each manipulation.

The ideal resolution is to perform a clean resync from an unaffected node. You tried the repair for 3.6.0; could you please downloading the 4.0.9 binaries and try the repair with that version? Once the repair is done, you can restart with the original binaries.

Generated at Thu Feb 08 04:56:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.