[SERVER-26087] _configsvrSetFeatureCompatibilityVersion should only set its own state if setFeatureCompatibilityVersion succeeded on all shards Created: 13/Sep/16 Updated: 19/Nov/16 Resolved: 14/Sep/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Internal Code |
| Affects Version/s: | None |
| Fix Version/s: | 3.3.14 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | David Storch | Assignee: | David Storch |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Backwards Compatibility: | Fully Compatible |
| Sprint: | Query 2016-09-19 |
| Participants: |
| Description |
|
The mongos implementation of setFeatureCompatibilityVersion works simply by calling the internal _configsvrSetFeatureCompatibilityVersion command on the primary shard of the config server replica set. In the implementation of this internal command, the config server does the following:
Namely, it sets its own state and then forwards the sFCV() command to all shards, failing if any of the shards fail. This leads to the following problem scenario:
Now the config server primary will report "3.4" as its feature compatibility version, even if sFCV("3.4") did not succeed cluster-wide. In order to allow the config server primary to act as the cluster's source of truth for the current feature compatibility version, it should set its own state only after all shards have returned successfully from sFCV(). |
| Comments |
| Comment by Githook User [ 14/Sep/16 ] |
|
Author: {u'username': u'dstorch', u'name': u'David Storch', u'email': u'david.storch@10gen.com'}Message: |
| Comment by David Storch [ 13/Sep/16 ] |
|
milkie, this would be nice, but I'm not sure if it would be worthwhile to invest the engineering effort. The user can re-run sFCV() against the mongos whenever it fails in order to obtain a consistent state across the cluster. |
| Comment by Eric Milkie [ 13/Sep/16 ] |
|
If setting the feature version fails on one of the shards, should the code attempt to undo the setting on the rest of the cluster? Otherwise you might be left with an unclean state across shards. |