[SERVER-79317] Provide more documentation and helper functions for case where feature flag checks could be run when FCV is uninitialized during initial sync Created: 25/Jul/23  Updated: 30/Oct/23  Resolved: 30/Oct/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.2.0-rc0, 7.0.4

Type: Task Priority: Major - P3
Reporter: Huayu Ouyang Assignee: Huayu Ouyang
Resolution: Fixed Votes: 0
Labels: repl-shortlist
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-82064 Create passthrough suite that pauses ... Closed
is depended on by SERVER-82246 Change isEnabled to invariant when FC... Closed
is depended on by SERVER-82270 Replace isEnabledAndIgnoreFCVUnsafeAt... Closed
Related
related to SERVER-79269 Invariant that we don't check FCV in ... Open
related to SERVER-79331 [Sharding 1] featureFlag audit v7.0 b... Open
related to SERVER-79330 Audit featureFlag usage in v7.0 branc... Backlog
related to SERVER-79337 [Serverless] Audit v7.0 featureFlag Backlog
related to SERVER-81225 Audit callsites of FCV getVersion and... Backlog
related to SERVER-79332 [Repl] Audit featureFlag usage in v7.0 Closed
related to SERVER-79333 [Storage Execution NAMER] Audit v7.0 ... Closed
related to SERVER-79335 Audit gFeatureFlagColumnstoreIndexes Closed
related to SERVER-79336 [Security] Audit v7.0 feature flag Closed
related to SERVER-80898 Audit gFeatureFlagChangeStreamsFurthe... Closed
related to SERVER-80899 Audit gFeatureFlagCommonQueryFramework Closed
related to SERVER-80900 Audit gFeatureFlagCompoundWildcardInd... Closed
related to SERVER-80901 Audit gFeatureFlagServerlessChangeStr... Closed
related to SERVER-80903 Audit gFeatureFlagBucketUnpackWithSort Closed
related to SERVER-80904 Audit gFeatureFlagBucketUnpackWithSort Closed
related to SERVER-80905 Audit gFeatureFlagShardedTimeSeries Closed
related to SERVER-80906 Audit gFeatureFlagTelemetry Closed
is related to SERVER-79274 FCV checks can be racy if FCV is unin... Closed
is related to SERVER-82087 FCVServerStatusMetrics::generateSecti... Closed
is related to SERVER-80902 Audit gFeatureFlagTelemetry Closed
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0
Sprint: Repl 2023-10-16, Repl 2023-10-30, Repl 2023-11-13
Participants:

 Description   

As part of PM-3236 it would be helpful to do an audit of the FCV.getVersion and the FeatureFlag::isEnabled, isEnabledAndIgnoreFCVUnsafe, isEnabledAndIgnoreFCVUnsafeAtStartup, isEnabledOnVersion and any other relevant FCV/feature flag helper functions just to make sure they're being used properly, and also to get a better sense of how engineers are using them and what improvements to the FCV/feature flag infrastructure should be made.
^ Moved to separate ticket SERVER-81225 for now since these two tasks are separate/have different urgencies

Specifically we should also look into providing more guidance and helper functions around the situation described in https://jira.mongodb.org/browse/SERVER-79330 (for example, adding a helper function for if the use case needs to wait until FCV is initialized before checking the feature flag, or otherwise providing more documentation)



 Comments   
Comment by Githook User [ 27/Oct/23 ]

Author:

{'name': 'Huayu Ouyang', 'email': 'huayu.ouyang@mongodb.com', 'username': 'huayu-ouyang'}

Message: SERVER-79317 Provide more documentation and helper functions for case where feature flag checks could be run when FCV is uninitialized during initial sync

(cherry picked from commit 167a5ba9e3b500dd974b8a797cc44838ae2bcf9c)
Branch: v7.0
https://github.com/mongodb/mongo/commit/98af21cfd6d712b09304e92c42a21f5f9f8561a9

Comment by Githook User [ 17/Oct/23 ]

Author:

{'name': 'Huayu Ouyang', 'email': 'huayu.ouyang@mongodb.com', 'username': 'huayu-ouyang'}

Message: SERVER-79317 Provide more documentation and helper functions for case where feature flag checks could be run when FCV is uninitialized during initial sync
Branch: master
https://github.com/mongodb/mongo/commit/167a5ba9e3b500dd974b8a797cc44838ae2bcf9c

Comment by Huayu Ouyang [ 05/Oct/23 ]

charlie.swanson@mongodb.com That commit is on 7.1 and the audit should be for 7.0. We are also considering reverting the change since we might not want to make all feature flags use the lastLTS FCV behaviro when the FCV is uninitialized.

Comment by Charlie Swanson [ 05/Oct/23 ]

It looks like in commit 27a8e30e8aa02d3f3be35630b970c7b903d1d186 we made a change to have the default behavior of 'isEnabled' match what is documented in 'isEnabledUseDefaultFCVWhenUninitialized()'? The header/definition for that method still exists, but the implementation was deleted in that commit. I think I can just close out my linked ticket SERVER-80902 since I was going to switch to that based on the comments. Just wanted to provide the feedback here that the text of the ticket and the state of the codebase are misleading. 

Comment by Huayu Ouyang [ 19/Sep/23 ]

charlie.swanson@mongodb.com To clarify, the issue in SERVER-79330 isn't a bug to be fixed in initial sync or FCV code, but just that each feature flag needs to figure out if it's possible for that any feature flag checks to be run during initial sync when the FCV could potentially be uninitialized, and then if it is, figure out what the correct behavior for that feature flag is: either behave as if the feature flag is enabled, regardless of what the real FCV is (isEnabledAndIgnoreFCVUnsafe), behave according to if the feature flag is enabled on the defaultLastLTS (isEnabledUseDefaultFCVWhenUninitialized), or it might need to do some other special logic. For example, we might want to wait until the FCV is initialized, and then check if the feature flag is enabled afterwards, or in SERVER-79183, we uasserted instead of invariant-ing.

We can definitely look into scheduling this ticket to add more guidance around this situation and add helper functions (such as for the case where we want to wait until the FCV is initialized, and then check if the feature flag is enabled). However, we don't think it would be feasible to create a test template or repro script to see if the initial sync race is possible since it's specific to each feature, and it would still be up to each individual feature afterwards to determine whether it could be run during initial sync and what the correct behavior for that feature would be.

Comment by Charlie Swanson [ 19/Sep/23 ]

huayu.ouyang@mongodb.com so I'm gathering that this edge case around initial sync in the mentioned linked ticket is not due to some new feature we developed.

Is there some sort of recommended test template you could provide? Can we maybe schedule this ticket more urgently? It looks like this epic is not in-progress and instead we've asked a bunch of different engineers across the organization to go gather the necessary context to fix this issue without first publishing some recommendations or any sort of reproduction script..

Comment by Huayu Ouyang [ 19/Sep/23 ]

brenda.rodriguez@mongodb.com Sorry I'm a little unclear about the question, this ticket isn't really describing a specific issue but more that we should audit our use cases of these functions to make sure they're being used properly and make usability improvements.

Comment by Charlie Swanson [ 19/Sep/23 ]

huayu.ouyang@mongodb.com can you weigh in?
>is this an issue specifically for 7.0 or does this need to be tested on all future versions for all feature flags?

Comment by Brenda Rodriguez [ 14/Sep/23 ]

randolph@mongodb.com is this an issue specifically for 7.0 or does this need to be tested on all future versions for all feature flags?

Generated at Thu Feb 08 06:40:40 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.