[SERVER-46758] setFCV can be interrupted before an FCV change is majority committed and rollback the FCV without running the setFCV server logic Created: 10/Mar/20  Updated: 29/Oct/23  Resolved: 22/Apr/20

Status: Closed
Project: Core Server
Component/s: Upgrade/Downgrade
Affects Version/s: None
Fix Version/s: 4.0.20, 4.2.8, 4.4.0-rc4, 4.7.0

Type: Bug Priority: Major - P3
Reporter: Dianna Hohensee (Inactive) Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
is duplicated by SERVER-44607 Rollback of an interrupted setFCV cmd... Closed
Related
related to SERVER-48541 Fix log output on rollback of fcv doc... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Repl 2020-04-06, Repl 2020-04-20, Repl 2020-05-04
Participants:
Linked BF Score: 30

 Description   

I believe this bug goes back all the way back to the beginning of the setFCV framework. Therefore it will need to be backport'ed.

A setFCV cmd will change the FCV value twice: first to put FCV into upgrading / downgrading; then to put FCV into fully upgraded / fully downgraded. For each of these FCV writes, we wait for majority confirmation before proceeding.

However, setFCV can be interrupted while waiting for majority write concern – InterruptedDueToReplStateChange for example – and roll back a step in FCV value. This manifested in test failures where the in-memory FCV value was found not to match the persisted FCV value: the persisted value had been rolled back, but the in-memory value was left unchanged by roll back. Recover to a stable timestamp wipes out writes back to the checkpoint and then plays writes forward from the oplog up to the desired point, so an FCV value change never goes through the OpObserver, even.

I think it’s okay if rollback moves FCV from fully upgraded/downgraded to upgrading/downgrading because the user can simply rerun setFCV in the right direction and the logic is idempotent. This scenario is the same as if the server fails at any point in setFCV and setFCV is retried and we know it works.

However, rolling back from upgrading/downgrading to fully downgraded/upgraded requires running the setFCV logic to make sure the rest of the server settings match the new FCV. And then I believe we must finish an upgrading/downgrading before we can move to downgrading/upgrading. Config servers will be their own special problem because their setFCV logic involves setting the shard servers first or last (I forget).



 Comments   
Comment by Githook User [ 02/Jun/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback

(cherry picked from commit aa527109a28bec0b6fe2763fce8a447ead0c02dd)
Branch: v4.0
https://github.com/mongodb/mongo/commit/2148b273b0bd531d6a89410298349556ccedfdd9

Comment by Githook User [ 01/Jun/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback

(cherry picked from commit aa527109a28bec0b6fe2763fce8a447ead0c02dd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/5f8ebd1f27e3dbfa27e75bac39ac6730a0b6719b

Comment by Githook User [ 01/Jun/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback

(cherry picked from commit aa527109a28bec0b6fe2763fce8a447ead0c02dd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/5f8ebd1f27e3dbfa27e75bac39ac6730a0b6719b

Comment by Githook User [ 01/Jun/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback

(cherry picked from commit aa527109a28bec0b6fe2763fce8a447ead0c02dd)
Branch: v4.2
https://github.com/mongodb/mongo/commit/5f8ebd1f27e3dbfa27e75bac39ac6730a0b6719b

Comment by Githook User [ 04/May/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback

(cherry picked from commit aa527109a28bec0b6fe2763fce8a447ead0c02dd)
Branch: v4.4
https://github.com/mongodb/mongo/commit/5d8474a6a9f376553d462ba8b9ba9df98024fe24

Comment by Githook User [ 22/Apr/20 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-46758 In-memory FCV value should properly reflect the on-disk FCV after a rollback
Branch: master
https://github.com/mongodb/mongo/commit/aa527109a28bec0b6fe2763fce8a447ead0c02dd

Comment by Jason Chan [ 17/Apr/20 ]

I don't believe this bug exists in 3.6 because it looks like this behaviour is specific to RTT. In Rollback via refetch, we end up refetching the fcv document and doing an update, which will trigger the onInsertOrUpdate here and update the in-memory FCV on storage commit.

I think this needs to be backported to v4.0 when we start using RTT.

Generated at Thu Feb 08 05:12:21 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.