To avoid breaking the system during a binary upgrade/downgrade, we make
{getParameter: 1 featureCompatibilityVersion: 1}wait for the FCV change to make it into the stable checkpoint using the waitForMajority mechanism to wait for the currentCommittedSnapshot, which is usually the same as the stable checkpoint.
These diverge if we do a config change which either changes the writeConcernMajorityJournalDefault, or is a force config which changes the contents of the set. At those times the currentCommittedSnapshot is cleared. This would be inconsequential if it weren't for another bug: configs with split horizons are erroneously determined to be different when they are not. This means that a config change brought about by an election, which is a force config on 4.4, can clear the currentCommittedSnapshot. If we never get a majority write after that point (e.g. because the other nodes were shut down), we will never be able to read the FCV. Unfortunately Cloud Backup has a procedure which commonly triggers this.
We can fix this by clearing the lastFCVUpdateSnapshot when we dropAllSnapshots (4.4) or clearCommittedSnapshot (5.0) in ReplicationCoordinatorExternImpl.
- related to
-
SERVER-59867 Split horizon mappings in ReplSetConfig/MemberConfig should be serialized deterministically
- Closed