[SERVER-58294] setFeatureCompatibilityVersion can cause crash after stepping down Created: 06/Jul/21  Updated: 29/Oct/23  Resolved: 19/Aug/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.1.0-rc0

Type: Bug Priority: Major - P3
Reporter: Jordi Serra Torrens Assignee: Vesselina Ratcheva (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File 0001-repro-bf-21787.patch    
Issue Links:
Depends
Backwards Compatibility: Fully Compatible
Operating System: ALL
Steps To Reproduce:

Repro test:
0001-repro-bf-21787.patch

./buildscripts/resmoke.py run --storageEngine=wiredTiger --storageEngineCacheSizeGB=.50 --suite=sharding --log=file jstests/sharding/bf-21787

Sprint: Repl 2021-08-09, Repl 2021-08-23
Participants:
Linked BF Score: 110

 Description   

setFeatureCompatibilityVersion reads the in-memory FCV value and then calls FeatureCompatibilityVersion::updateFeatureCompatibilityVersionDocument, which will use that FCV value we read to look up the transitional FCV state. It will fassert that it is found.

The problem is that we could have stepped down anywhere here, but we wouldn't realize yet because:
a. When setFCV acquires the FCV lock in exclusive mode, it uses the non-interruptible variant of the ExclusiveLock constructor
b. We could even have stepped down after having acquired the lock, but before reading the in-memory FCV value.

Then, if the new primary runs setFCV and the former primary that is still running setFCV replicates the FCV change, the former primary could attempt to look up an invalid FCV transition and fassert.

To address this, FeatureCompatibilityVersion::updateFeatureCompatibilityVersionDocument should check that the opCtx is not interrupted.



 Comments   
Comment by Vivian Ge (Inactive) [ 06/Oct/21 ]

Updating the fixversion since branching activities occurred yesterday. This ticket will be in rc0 when it’s been triggered. For more active release information, please keep an eye on #server-release. Thank you!

Comment by Githook User [ 19/Aug/21 ]

Author:

{'name': 'Vesselina Ratcheva', 'email': 'vesselina.ratcheva@10gen.com', 'username': 'vessy-mongodb'}

Message: SERVER-58294 Improve interrupt detection when doing update for setFCV command
Branch: master
https://github.com/mongodb/mongo/commit/83485c98f17890f29e008d81b5d9cd34c9893182

Comment by Connie Chen [ 30/Jul/21 ]

vesselina.ratcheva, I think this ticket's fixversion should be 5.1 required since it's a fix for a Hot BF. 

CC: elizabeth.roytburd

Generated at Thu Feb 08 05:44:07 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.