Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-19815

Improved mongod --repair option for WiredTiger

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • 4.0.3, 4.1.4
    • Affects Version/s: 3.0.5
    • Component/s: WiredTiger
    • Minor Change
    • v4.0
    • Storage NYC 2018-06-18, Storage NYC 2018-09-10, Storage NYC 2018-09-24

      Issue Status as of October 1st, 2018

      ISSUE DESCRIPTION AND IMPACT
      The mongod --repair option was originally introduced for use with the MMAP storage engine; when it is used with WiredTiger, attempts to recover a corrupted dbpath via mongod --repair may fail under a number of specific scenarios.
      Enhanced repair functionality allows mongod --repair to successfully recover from a wider variety of faulty conditions that previously would have resulted in a repair failure. It’s important to note that these changes do not allow the mongod to recover otherwise unretrievable data; instead, they ensure that the data set is returned to a working state with as much data as the process was able to salvage.
      In addition to a more robust repair mechanism, this change adds the following new behavior:

      • If the repair operation modifies data for a node in a replica set, it will not be able to rejoin the replica set until it has been fully resynced. This behavior is designed to prevent an instance where a node with only partial data recovered via mongod --repair could potentially become a replica set primary, as this would result in data effectively going missing.
      • If a repair operation fails for any reason, the node will not be able to start up again without the mongod --repair option. This precaution is included to prevent instances where the mongod is repeatedly restarted with a broken data set, potentially resulting in additional data corruption.

      DIAGNOSIS AND AFFECTED VERSIONS
      This issue is exhibited whenever a mongod --repair command fails to start the mongod and instead returns an error message. There are several error messages than can be returned - some of the most common:

      Fatal Assertion 28558 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp 
      
      WiredTiger.wt: encountered an illegal file format or internal value
      

      While these are only some of the most common, most mongod --repair operations that fail to boot the mongod exhibit this issue.
      This issue affects MongoDB versions 3.0 - 4.0.2 that use the WiredTiger storage engine.

      REMEDIATION AND WORKAROUNDS
      Currently, the only workaround available is to resync from a healthy node in a replica set, restore the dbpath from an earlier backup, or open a SERVER project ticket to request a manual repair attempt of the WiredTiger metadata files.

      FIX VERSIONS
      This issue is fixed in MongoDB 4.0.3 as well as in 4.1.4, and will be available in the 4.2 production release.

      Original description

      The repair loop should be more forgiving about failures such as missing files and deal with collections or indexes missing from the catalog with a big warning message.

            Assignee:
            louis.williams@mongodb.com Louis Williams
            Reporter:
            michael.cahill@mongodb.com Michael Cahill (Inactive)
            Votes:
            37 Vote for this issue
            Watchers:
            48 Start watching this issue

              Created:
              Updated:
              Resolved: