[SERVER-32569] Introduce uniform way to allow config servers and shard replica sets to start in non-cluster mode Created: 05/Jan/18 Updated: 30/Oct/23 Resolved: 21/Jan/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | 3.2.0, 3.4.0, 3.6.0, 3.7.1 |
| Fix Version/s: | 3.2.19, 3.4.11, 3.6.3, 3.7.2 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Kaloian Manassiev | Assignee: | Misha Tyulenev |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Backport Requested: |
v3.6, v3.4, v3.2
|
||||||||||||||||
| Sprint: | Sharding 2018-01-29 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Starting with MongoDB version 3.2, all sharding database components (config server and shard replica sets) persist the fact that they belong to a sharded cluster. This information is stored in two places - the cluster identity document and the replica set configuration (config servers only). Once this information persisted, it is not possible to restart a config server or shard as an independent replica set, because startup will fail if --configsvr or --shardsvr are missing. This serves as a protection against customers inadvertently omitting startup parameters and misconfiguring their systems, but it also prevents the shard to be started up for maintenance (e.g., restore). In order to unify the non-cluster behaviour across all versions and unblock the Cloud team, on all versions starting from 3.2 we will introduce a new startup-only parameter on mongod called --setParameter skipShardingConfigurationChecks=true, which is incompatible with --configsvr or --shardsvr. The meaning of this flag is "I am planning to restore directly into the node, I know what I am doing and I don't want any sharding validations or background threads to run". This flag will make this and this checks conditional on the flag being enabled, so that replica set nodes will not fail to start or start as REMOVED. |
| Comments |
| Comment by Githook User [ 29/Jan/18 ] |
|
Author: {'email': 'misha@mongodb.com', 'name': 'Misha Tyulenev', 'username': 'mikety'}Message: |
| Comment by Githook User [ 22/Jan/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit b251fd633d7572c0b221df3b316534596e981041) |
| Comment by Githook User [ 22/Jan/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: (cherry picked from commit b251fd633d7572c0b221df3b316534596e981041) |
| Comment by Githook User [ 21/Jan/18 ] |
|
Author: {'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}Message: |
| Comment by Kaloian Manassiev [ 10/Jan/18 ] |
|
Yes, spencer and I discussed part 2 yesterday and he suggested the same thing. I have it on my TODO list to file a ticket, which defines how this would work - because there are a couple of "races", which need to be thought through. |
| Comment by Andy Schwerin [ 10/Jan/18 ] |
|
Part 1 of our proposal seems fine, kaloian.manassiev, with one caveat. The "transition to REMOVED" behavior you cite is related to the 3.0->3.2 rolling upgrade process, so you'll want to be careful. Part 2 I'm less certain of. I'd like to try to catch these misconfigurations closer to startup, if possible. We can already do that for config servers, because the replica set configuration document contains the configsvr: true flag. For shards, it's certainly trickier. We can consider that under the separate ticket, when you file it. Also, please link that ticket to this one. |
| Comment by Kaloian Manassiev [ 05/Jan/18 ] |
|
Part 1: In order to unify the non-cluster behaviour across all versions and unblock the Cloud team, I propose the following:
Part 2 (not part of this ticket): Since internally we rely heavily on the --configsvr/--shardsvr flags being set, in order to tighten these checks I propose that we also add this extra logic:
|