-
Type: Bug
-
Resolution: Unresolved
-
Priority: Major - P3
-
Affects Version/s: None
-
Component/s: None
-
Replication
In the 8.0 release, we had two separate cases where we used FCV to gate upgrading and force customers to eliminate data formats that we removed support for. (One was a time-series collection format that we thought shouldn't be possible to exist, and one was a queryable encryption beta feature collection format, linked as HELP tickets below.)
When we originally architected FCV, we included an "upgrading" state that was intended both to be quickly transient and also not possible to fail; thus we did not implement a "rollback to downgraded" transition for this state.
In the 8.0 release, we then introduced new ways for the upgrade to fail, returning an error from the setFCV command, and to leave the FCV in "upgrading" status. This state breaks backup snapshots and backup restores. I believe it would be a lot of tricky work to get the backup code to support this state, so instead I think we should mitigate this within the server.
One option might be to implement "upgrading-to-downgraded" FCV state transitions, and to automatically engage this state transition whenever an error occurs. This would not fully fix the backup issues, however. Instead, I think we should implement a policy that no FCV upgrade should be able to fail due to the retirement of data formats. Instead, all such data format retirement should instead automatically transmute such data formats into acceptable formats (by changing catalog entries, renaming collections, dropping indexes, moving data into a "lost&found" area like fschk does for disks, et cetera). This would solve our backup/restore problems with FCV upgrades, as the restore process depends on the ability to always upgrade FCV on a snapshot successfully, without manual intervention.