[SERVER-30459] shardCollection should fail if running in a mixed-FCV cluster Created: 01/Aug/17  Updated: 23/Aug/17  Resolved: 23/Aug/17

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.5.10
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Esha Maharishi (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-30358 shardCollection should ask primary sh... Closed
Related
is related to SERVER-29760 propagate UUID from primary shard to ... Closed
Sprint: Sharding 2017-08-21, Sharding 2017-09-11
Participants:

 Description   

A mixed-FCV cluster can occur if setFCV fails partway.

setFCV updates the FCV's in this order:
1) shards' FCVs
2) config server's FCV

So, if setFCV for 3.4 -> 3.6 fails partway, some shards may have FCV=3.6 while the config server has FCV=3.4.

In this case, since the config server is in FCV=3.4, shardCollection will not ask the primary shard for a UUID. However, if the primary shard is in FCV=3.6, it will already have a UUID. If setFCV is later called again to resume the upgrade, the config server will generate a (different) UUID for the collection that was just sharded.

Since there is no way for the config server to know whether the primary shard was upgraded as part of the previous failed setFCV (the setFCV call from the previous attempt may still be in flight, and may race with, say, a currentOp sent by shardCollection to check if setFCV is currently running on the shard), we should prevent running shardCollection in a mixed-FCV cluster.

Note: this also needs to prevent shardCollection from a 3.4 mongos from running in a mixed-version cluster, maybe by preventing writes to config.collections through an OpObserver?



 Comments   
Comment by Esha Maharishi (Inactive) [ 01/Aug/17 ]

Marked as "minor" backwards breaking change because it prevents shardCollection from running while a cluster is in a mixed-FCV state, even if setFCV is not currently running.

Generated at Thu Feb 08 04:23:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.