[SERVER-36119] addShard should fail if the added shard's FCV is higher than that of the cluster Created: 13/Jul/18  Updated: 26/Oct/23

Status: Backlog
Project: Core Server
Component/s: Sharding, Upgrade/Downgrade
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Tess Avitabile (Inactive) Assignee: Backlog - Catalog and Routing
Resolution: Unresolved Votes: 0
Labels: ShardingRoughEdges, oldshardingemea
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
Assigned Teams:
Catalog and Routing
Operating System: ALL
Participants:
Linked BF Score: 26

 Description   

In the addShard command, we run setFeatureCompatibilityVersion on the replica set to ensure it has the same featureCompatibilityVersion as the config server. Once this succeeds, we add the shard to config.shards. However, setFeatureCompatibilityVersion only requires that the update to admin.system.version reach a majority of nodes in order to return success. If there are any lower-version mongoses in the cluster, then when they observe the existence of a new shard, they will connect to it and crash if they encounter a node with a higher-version feature compatibility version. We should make the setFeatureCompatibilityVersion command use a w:all writeConcern, so that it waits for the update to reach all members of the new shard (in addition to the w:majority wait that ensures the update is committed).



 Comments   
Comment by Githook User [ 02/Aug/18 ]

Author:

{'username': 'kaloianm', 'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-36119 Explicitly downgrade new shard's FCV in the mixed version convert_to_and_from_sharded.js
Branch: master
https://github.com/mongodb/mongo/commit/aee1a8d71d0b7c6c806e9790cb1310c94f36090a

Comment by Tess Avitabile (Inactive) [ 01/Aug/18 ]

Yes

Comment by Kaloian Manassiev [ 01/Aug/18 ]

I agree that we can at least fix the test for now by explicitly setting the FCV on the replica set being added to be the 'last-stable' FCV.

tess.avitabile, this is what you had in mind, right?

Comment by Tess Avitabile (Inactive) [ 01/Aug/18 ]

We could fix the test by setting FCV on the replica set to the downgrade version before adding it to the cluster. If we implement the solution schwerin suggests, we would need to make that change to the test anyway.

Comment by Ian Whalen (Inactive) [ 01/Aug/18 ]

greg.mckeon kaloian.manassiev: can you please consider pulling this forward? convert_to_and_from_sharded.js is just a total mess right now:

https://evergreen.mongodb.com/task_history/mongodb-mongo-master/sharding_last_stable_mongos_and_mixed_shards?revision=e8379141cd2fd3f841c87a2817cc04c4830ed72e#/convert_to_and_from_sharded.js=fail

Comment by Andy Schwerin [ 24/Jul/18 ]

Ah. OK. To summarize and offline conversation, if a replica set started with --shardsrv w/ 4.0 binaries wasn't previously used as a standalone replica set, it will report fcv 3.6. As such, this problem only occurs when trying to add a shard that contains some user data already. I think in that case, we should not downgrade the fcv on the target shardautomatically, but instead refuse to add the shard if its fcv is higher than the cluster's fcv. If a user wants to add an fcv 4.0 shard to and fcv 3.6 cluster, they should first need to lower the fcv on that shard to 3.6 and remove any 4.0-specific data and indexes.

Comment by Tess Avitabile (Inactive) [ 23/Jul/18 ]

Yes, we have always let you do this.

Comment by Andy Schwerin [ 23/Jul/18 ]

Oh, I'm surprised we let you add shards in fCV 4.0 to a fCV 3.6 cluster. I'm still hesitant to require a successful "writeConcern: all" write to addShard, though if we have to allow fCV 4.0 shards to be added to fCV 3.6 replica sets, we may have no choice.

Comment by Tess Avitabile (Inactive) [ 23/Jul/18 ]

This is to address the case where the cluster has lower-version FCV and a lower binary version mongos. To be concrete, let's say the mongods all have binary version 4.0 and FCV 3.6, and the mongoses have binary version 3.6. If we add a shard that has FCV 4.0, the config server sends {{

{setFeatureCompatibilityVersion: "3.6"}

}} as part of the addShard command. This will succeed as soon as it reaches a majority of the set. But if there is still a node in the set with FCV 4.0, it will cause the 3.6 mongoses in the cluster to crash. This seems like poor behavior–that the addShard succeeds, but the mongoses in the cluster can crash. I think it would be better to wait for the FCV to reach all nodes in the set before successfully adding the shard.

Comment by Kaloian Manassiev [ 16/Jul/18 ]

If the config server has the newer FCV this means that all the existing shards should be at the newer FCV already, doesn't it? In which case it would have been just a matter of time before the old mongos instances crash anyways.

Comment by Andy Schwerin [ 14/Jul/18 ]

But if one of those nodes is a low version, it's still going to crash? Why is addShard special in this regard?

Generated at Thu Feb 08 04:42:05 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.