[SERVER-32569] Introduce uniform way to allow config servers and shard replica sets to start in non-cluster mode Created: 05/Jan/18  Updated: 30/Oct/23  Resolved: 21/Jan/18

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.2.0, 3.4.0, 3.6.0, 3.7.1
Fix Version/s: 3.2.19, 3.4.11, 3.6.3, 3.7.2

Type: Task Priority: Major - P3
Reporter: Kaloian Manassiev Assignee: Misha Tyulenev
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-11323 Docs for SERVER-32569: Introduce unif... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v3.6, v3.4, v3.2
Sprint: Sharding 2018-01-29
Participants:

 Description   

Starting with MongoDB version 3.2, all sharding database components (config server and shard replica sets) persist the fact that they belong to a sharded cluster. This information is stored in two places - the cluster identity document and the replica set configuration (config servers only).

Once this information persisted, it is not possible to restart a config server or shard as an independent replica set, because startup will fail if --configsvr or --shardsvr are missing. This serves as a protection against customers inadvertently omitting startup parameters and misconfiguring their systems, but it also prevents the shard to be started up for maintenance (e.g., restore).

In order to unify the non-cluster behaviour across all versions and unblock the Cloud team, on all versions starting from 3.2 we will introduce a new startup-only parameter on mongod called --setParameter skipShardingConfigurationChecks=true, which is incompatible with --configsvr or --shardsvr. The meaning of this flag is "I am planning to restore directly into the node, I know what I am doing and I don't want any sharding validations or background threads to run".

This flag will make this and this checks conditional on the flag being enabled, so that replica set nodes will not fail to start or start as REMOVED.



 Comments   
Comment by Githook User [ 29/Jan/18 ]

Author:

{'email': 'misha@mongodb.com', 'name': 'Misha Tyulenev', 'username': 'mikety'}

Message: SERVER-32569 allow config servers and shard replica sets to start in non-cluster mode
Branch: v3.2
https://github.com/mongodb/mongo/commit/1b8cce46314746e106445896d70ca1611ab97ca3

Comment by Githook User [ 22/Jan/18 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}

Message: SERVER-32569 allow config servers and shard replica sets to start in non-cluster mode

(cherry picked from commit b251fd633d7572c0b221df3b316534596e981041)
Branch: v3.4
https://github.com/mongodb/mongo/commit/34f5bec2c9d827d71828fe858167f89a28b29a2a

Comment by Githook User [ 22/Jan/18 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}

Message: SERVER-32569 allow config servers and shard replica sets to start in non-cluster mode

(cherry picked from commit b251fd633d7572c0b221df3b316534596e981041)
Branch: v3.6
https://github.com/mongodb/mongo/commit/924198af93bd2792f1e8bd86fc9806504826ca2f

Comment by Githook User [ 21/Jan/18 ]

Author:

{'name': 'Misha Tyulenev', 'email': 'misha@mongodb.com', 'username': 'mikety'}

Message: SERVER-32569 allow config servers and shard replica sets to start in non-cluster mode
Branch: master
https://github.com/mongodb/mongo/commit/b251fd633d7572c0b221df3b316534596e981041

Comment by Kaloian Manassiev [ 10/Jan/18 ]

Yes, spencer and I discussed part 2 yesterday and he suggested the same thing. I have it on my TODO list to file a ticket, which defines how this would work - because there are a couple of "races", which need to be thought through.

Comment by Andy Schwerin [ 10/Jan/18 ]

Part 1 of our proposal seems fine, kaloian.manassiev, with one caveat. The "transition to REMOVED" behavior you cite is related to the 3.0->3.2 rolling upgrade process, so you'll want to be careful.

Part 2 I'm less certain of. I'd like to try to catch these misconfigurations closer to startup, if possible. We can already do that for config servers, because the replica set configuration document contains the configsvr: true flag. For shards, it's certainly trickier. We can consider that under the separate ticket, when you file it. Also, please link that ticket to this one.

Comment by Kaloian Manassiev [ 05/Jan/18 ]

Part 1: In order to unify the non-cluster behaviour across all versions and unblock the Cloud team, I propose the following:

  • (On all versions starting from 3.2) Introduce a new startup-only parameter on mongod called --setParameter skipShardingConfigurationChecks=true, which is incompatible with --configsvr or --shardsvr. The meaning of this flag is "I am planning to restore directly into the node, I know what I am doing and I don't want any sharding validations or background threads to run".
  • Make this and this checks conditional on the flag being enabled, so that replica set nodes will not fail to start or start as REMOVED.

Part 2 (not part of this ticket): Since internally we rely heavily on the --configsvr/--shardsvr flags being set, in order to tighten these checks I propose that we also add this extra logic:

  • (On all versions starting from 3.6) Add a check in the replica set's transition to primary, such that if the node contains a cluster id, but is missing the configsvr/shardsvr flags, the transition will fail, unless forClusterRestore is enabled. Ideally this check should also happen when the node is secondary or is transitioning to secondary as well so that there is no way that customers can misconfigure their nodes.

spencer, schwerin - do you see any issues with this?

Generated at Thu Feb 08 04:30:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.