[SERVER-75577] Allow specifying FCV on new mongod Created: 03/Apr/23  Updated: 29/Oct/23  Resolved: 12/May/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 7.1.0-rc0, 7.0.0-rc1

Type: Improvement Priority: Blocker - P1
Reporter: Siyuan Zhou Assignee: Huayu Ouyang
Resolution: Fixed Votes: 0
Labels: former-quick-wins
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Related
Assigned Teams:
Replication
Backwards Compatibility: Fully Compatible
Backport Requested:
v7.0
Sprint: Execution Team 2023-05-15, Execution Team 2023-05-29
Participants:

 Description   

During version upgrades, Atlas upgrades the binary version first while keeping the old FCV for a few days, then upgrades the FCV if everything works well. During this period of mixed binary version and FCV, new Atlas clusters (replset and sharding clusters) should run with the old FCV too.

Because mongod only starts with the FCV as same as its binary version, currently Automation Agent implements a workaround. Agent starts a new binary (e.g. 6.2) and then downgrades it to the old FCV (e.g. 6.1). This downgrade isn't safe because FCV downgrade may involve manual data manipulation, such as resetting the cap collection's size due to SERVER-67036. It caused issues in the linked tickets.

This happens when creating new clusters for customers and also for backup restores where they need to create an intermediate restore VM to prepare the restore collection data. 

It's possible to work around it by starting the old binary version and then to upgrade the binary to the latest version for new clusters, such that the old FCV remains. However, it's not only inefficient, but also complex since it's not trivial for the Agent to know if the cluster is new. MMS control plane knows whether a cluster is new but is supposed to only tell the Agent the end goal and let it to plan accordingly.

FCV is a state managed by mongod, so it is not ideal for its consumer to work around the mechanism in order to set it to a lower one. Instead, we should allow specifying the default FCV as a startup parameter. If no existing FCV is detected, the specified default FCV should be used instead of the same version as the binary. Thus, Agent just needs to provide the desired default FCV to the new clusters.

This should work for standalone, replica set and sharding clusters. 

We will need to make changes to https://github.com/10gen/mongo/blob/c6e5701933a98b4fe91c2409c212fcce2d3d34f0/src/mongo/db/commands/feature_compatibility_version.cpp#L432-L468. and take extra care that it works for sharding as well

 



 Comments   
Comment by Githook User [ 12/May/23 ]

Author:

{'name': 'Huayu Ouyang', 'email': 'huayu.ouyang@mongodb.com', 'username': 'huayu-ouyang'}

Message: SERVER-75577 Allow specifying FCV on new mongod

(cherry picked from commit 44d9b193ab7a2e86b7f63a276f0215d24c3c35d6)
Branch: v7.0
https://github.com/mongodb/mongo/commit/aaab48df085284185de7788de5989f5e257ff777

Comment by Githook User [ 12/May/23 ]

Author:

{'name': 'Huayu Ouyang', 'email': 'huayu.ouyang@mongodb.com', 'username': 'huayu-ouyang'}

Message: SERVER-75577 Allow specifying FCV on new mongod
Branch: master
https://github.com/mongodb/mongo/commit/44d9b193ab7a2e86b7f63a276f0215d24c3c35d6

Comment by Eric Milkie [ 11/May/23 ]

Design facets for this:
The server (mongod and mongos) will support specifying a "default" FCV in the YAML config as well as on the command line.
When a default FCV is provided, the server will choose to use that FCV when the server notices it is starting up with empty data files (admin.system.version is missing or empty). It will either choose that FCV if the binary supports it, or if not, it will ignore the default FCV and log an informational message. If admin.system.version already exists on startup, the server will ignore the default FCV and log an informational message. If the default FCV provided is not valid (e.g. a version the server does not know), the server will only log an informational message; it won't abort startup or return any errors.

Comment by Siyuan Zhou [ 03/Apr/23 ]

This seems a small change to me. Without it, every rapid release will introduce the extra risk and fixing that would be hard and time consuming for Atlas team. I know 7.0 code freeze is coming. Is it possible to add this in 7.1?

Generated at Thu Feb 08 06:30:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.