Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 4.0.20, 4.2.8, 4.4.0-rc4, 4.7.0
Affects Version/s: None
Component/s: Upgrade/Downgrade
Labels:
None

Backwards Compatibility:
Fully Compatible
Operating System:
ALL
Backport Requested:

v4.4, v4.2, v4.0
Sprint:
Repl 2020-04-06, Repl 2020-04-20, Repl 2020-05-04
Linked BF Score:
30
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

I believe this bug goes back all the way back to the beginning of the setFCV framework. Therefore it will need to be backport'ed.

A setFCV cmd will change the FCV value twice: first to put FCV into upgrading / downgrading; then to put FCV into fully upgraded / fully downgraded. For each of these FCV writes, we wait for majority confirmation before proceeding.

However, setFCV can be interrupted while waiting for majority write concern – InterruptedDueToReplStateChange for example – and roll back a step in FCV value. This manifested in test failures where the in-memory FCV value was found not to match the persisted FCV value: the persisted value had been rolled back, but the in-memory value was left unchanged by roll back. Recover to a stable timestamp wipes out writes back to the checkpoint and then plays writes forward from the oplog up to the desired point, so an FCV value change never goes through the OpObserver, even.

I think it’s okay if rollback moves FCV from fully upgraded/downgraded to upgrading/downgrading because the user can simply rerun setFCV in the right direction and the logic is idempotent. This scenario is the same as if the server fails at any point in setFCV and setFCV is retried and we know it works.

However, rolling back from upgrading/downgrading to fully downgraded/upgraded requires running the setFCV logic to make sure the rest of the server settings match the new FCV. And then I believe we must finish an upgrading/downgrading before we can move to downgrading/upgrading. Config servers will be their own special problem because their setFCV logic involves setting the shard servers first or last (I forget).

is duplicated by

SERVER-44607 Rollback of an interrupted setFCV cmd can result in the in-memory serverGlobalParams.featureCompatibility diverging from what's written on disk

Closed

related to

SERVER-48541 Fix log output on rollback of fcv document

Closed

Assignee:: Jason Chan
Reporter:: Dianna Hohensee (Inactive)
Participants:: Dianna Hohensee, Githook User, Jason Chan
Votes:: 0 Vote for this issue
Watchers:: 11 Start watching this issue

Created:: Mar 10 2020 04:24:14 PM UTC
Updated:: Oct 29 2023 10:11:01 PM UTC
Resolved:: Apr 22 2020 10:00:37 PM UTC
Confidence Status Last Update:: 15/Apr/20 3:27 PM

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates