ISSUE DESCRIPTION AND IMPACT
The mongod --repair option was originally introduced for use with the MMAP storage engine; when it is used with WiredTiger, attempts to recover a corrupted dbpath via mongod --repair may fail under a number of specific scenarios.
Enhanced repair functionality allows mongod --repair to successfully recover from a wider variety of faulty conditions that previously would have resulted in a repair failure. It’s important to note that these changes do not allow the mongod to recover otherwise unretrievable data; instead, they ensure that the data set is returned to a working state with as much data as the process was able to salvage.
In addition to a more robust repair mechanism, this change adds the following new behavior:
- If the repair operation modifies data for a node in a replica set, it will not be able to rejoin the replica set until it has been fully resynced. This behavior is designed to prevent an instance where a node with only partial data recovered via mongod --repair could potentially become a replica set primary, as this would result in data effectively going missing.
- If a repair operation fails for any reason, the node will not be able to start up again without the mongod --repair option. This precaution is included to prevent instances where the mongod is repeatedly restarted with a broken data set, potentially resulting in additional data corruption.
DIAGNOSIS AND AFFECTED VERSIONS
This issue is exhibited whenever a mongod --repair command fails to start the mongod and instead returns an error message. There are several error messages than can be returned - some of the most common:
Fatal Assertion 28558 at src\mongo\db\storage\wiredtiger\wiredtiger_util.cpp
WiredTiger.wt: encountered an illegal file format or internal value
While these are only some of the most common, most mongod --repair operations that fail to boot the mongod exhibit this issue.
This issue affects MongoDB versions 3.0 - 4.0.2 that use the WiredTiger storage engine.
REMEDIATION AND WORKAROUNDS
Currently, the only workaround available is to resync from a healthy node in a replica set, restore the dbpath from an earlier backup, or open a SERVER project ticket to request a manual repair attempt of the WiredTiger metadata files.
This issue is fixed in MongoDB 4.0.3 as well as in 4.1.4, and will be available in the 4.2 production release.