[SERVER-39580] [4.0] Skip repairing the FCV document if the major version is too low Created: 14/Feb/19  Updated: 29/Oct/23  Resolved: 22/Mar/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.0.8

Type: Improvement Priority: Major - P3
Reporter: Louis Williams Assignee: Gregory Wlodarek
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Related
related to SERVER-37688 unable to read root page from file:Wi... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.0
Sprint: Storage NYC 2019-03-25
Participants:
Story Points: 3

 Description   

When repair with 4.0.3+ runs against older versions of MongoDB, we first repair all databases, and then check if we can restore the FCV document by checking if all collections have UUIDs. If they do not, repair exits without completing, but cleanly, downgrading the WiredTiger log files.

** IMPORTANT: UPGRADE PROBLEM: The data files need to be fully upgraded to version 3.6 before attempting an upgrade to 4.0; see http://dochub.mongodb.org/core/4.0-upgrade-fcv for more details.
Downgrading WiredTiger datafiles.

This causes a few problems. The first is that it appears like an error, despite having repaired all databases.

The other is that it does not remove the _repair_incomplete file. This causes MongoDB 4.0 to fassert on the following startup:

An incomplete repair has been detected! This is likely because a repair operation unexpectedly failed before completing. MongoDB will not start up again without --repair.
Fatal Assertion 50922 at src/mongo/db/storage/storage_engine_init.cpp 86

As a result of the fassert, the WiredTiger log file version is not downgraded, something we only do on clean shutdown. The has the following effects:

  • log file versions are unintentionally upgraded, preventing the original binary from ever starting again
  • if the FCV is old enough (more than 1 major release), the new binary also fails to start

The only options for recovering then become:

  • Delete the journal files, which can lead to data inconsistencies
  • Retry repair from a backup without introducing an unclean shutdown


 Comments   
Comment by Githook User [ 22/Mar/19 ]

Author:

{'email': 'gregory.wlodarek@mongodb.com', 'name': 'Gregory Wlodarek', 'username': 'GWlodarek'}

Message: SERVER-39580 Skip checking collection UUIDs and repairing the FCV document if the data files version is too low
Branch: v4.0
https://github.com/mongodb/mongo/commit/91c069aaf7057d31a751840c1fe0da2928487afb

Comment by Maria van Keulen [ 14/Feb/19 ]

We could also consider removing the handling that attempts to restore a missing featureCompatibilityVersion document with --repair. That handling was added as part of SERVER-29452 to avoid wedging users into a situation where they could remove their fCV document and no longer be able to start up anymore. However, as of SERVER-29453 and SERVER-32205, it is difficult for a user to get into the situation where they could accidentally remove their fCV document. Given that the restoration handling in SERVER-29452 only accounts for the case where the fCV document is missing (as opposed to corrupted), I think the risks and costs of keeping and maintaining this handling should be re-evaluated relative to the benefits of addressing a very unlikely fCV document removal.

Generated at Thu Feb 08 04:52:28 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.