[SERVER-31083] Allow passing primary shard to "enableSharding" command for a new database Created: 13/Sep/17  Updated: 30/Oct/23  Resolved: 06/Nov/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.3.1, 4.2.2, 4.0.14

Type: Improvement Priority: Major - P3
Reporter: Asya Kamsky Assignee: Marcos José Grillo Ramirez
Resolution: Fixed Votes: 4
Labels: ShardingRoughEdges, high-value, neweng
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Documented
is documented by DOCS-13276 Investigate changes in SERVER-31083: ... Closed
Related
is related to SERVER-31020 Sharding database creation is slow be... Open
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2, v4.0
Sprint: Sharding 2019-11-04, Sharding 2019-11-18
Participants:
Case:

 Description   

Background and motivation

Both replica set and sharded cluster MongoDB installations support implicit database and collection creation. In a sharded cluster, by default, implicitly created databases do not support creating sharded collections under them and because of this, sharding provides the enableSharding command, which explicitly creates the database and marks it as permitting sharded collections.

Currently, both implicitly created databases and those created through enableSharding (partially) use the balancer's statistics gathering logic to find the shard with the smallest data size and place the database's primary on it.

We have seen pathological cases where multiple concurrent implicit database creations end-up placing all database primaries on the same shard. In addition, because the implicit database placement doesn't use the complete balancer placement logic, it also does not take into account zones, which may lead to database primaries violating location requirements such as GDPR.

Proposed solution

Expose an optional string parameter called primaryShard on the enableSharding command.

If this parameter is present, it must contain the id of a valid shard, and the new database's primary should be placed on that shard. If the database already exists and its current primary is the same as the one specified through primaryShard, the command succeeds. Otherwise, the command should fail with error code NamespaceExists = 48.

If the parameter is omitted, the command should behave like it does currently and place the database's primary on the shard with the currently smallest data size.



 Comments   
Comment by Githook User [ 07/Nov/19 ]

Author:

{'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@10gen.com'}

Message: SERVER-31083 Allow passing primary shard to "enableSharding" command for a new database

(cherry picked from commit 6ef06c9093462bec22c2219c341b0219f1864cca)
(cherry picked from commit 89a83b368a3d71fcf47ced4245235ccbb7b6961c)
Branch: v4.0
https://github.com/mongodb/mongo/commit/253e674104d298e2f6882b5c8207a2058e648b42

Comment by Githook User [ 07/Nov/19 ]

Author:

{'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@10gen.com'}

Message: SERVER-31083 Allow passing primary shard to "enableSharding" command for a new database

(cherry picked from commit 6ef06c9093462bec22c2219c341b0219f1864cca)
Branch: v4.2
https://github.com/mongodb/mongo/commit/89a83b368a3d71fcf47ced4245235ccbb7b6961c

Comment by Githook User [ 06/Nov/19 ]

Author:

{'email': 'marcos.grillo@10gen.com', 'name': 'Marcos José Grillo Ramírez'}

Message: SERVER-31083 Allow passing primary shard to "enableSharding" command for a new database
Branch: master
https://github.com/mongodb/mongo/commit/6ef06c9093462bec22c2219c341b0219f1864cca

Comment by Kaloian Manassiev [ 28/Oct/19 ]

For people who are following this ticket, I updated the description with the way we intend to implement this.

Comment by Andy Schwerin [ 25/Jan/18 ]

The point of this ticket was to make sure the primary shard for a database ended up in the desired place. I think the target zone sent to enableSharding for a non-existent database would just ensure that the selected primary shard was in that zone. If the database already existed, I imagined that enableSharding would fail if the primary shard weren't in the zone.

Comment by Kaloian Manassiev [ 25/Jan/18 ]

schwerin, this sounds like a better approach than passing the primary id. However, technically it is not databases that contain data, but the collections themselves.

Assuming we start tracking unsharded collections like sharded, are you suggesting that we introduce something like a "default zone" for database, so that each new collection is created on a shard from the zone and if there are further per-collection zones, they take precedence?

Comment by Andy Schwerin [ 25/Jan/18 ]

I think that we should get away from requesting specific shards, and instead if we do this, move towards requiring specific zones. How does that sound, kaloian.manassiev?

Comment by Asya Kamsky [ 24/Jan/18 ]

Couldn't selection of shard as primary for DB be important for location/zone purposes? I suppose then the option should exist on "createDatabase" for cases where the database won't ever be sharded.

Comment by Kaloian Manassiev [ 21/Nov/17 ]

The location of the data for a sharded database/collection should not be a concern of the application and it is the job of the sharding balancer logic to make that selection as efficient as possible. Introducing a knob like the one proposed runs the risk of introducing imbalance across the sharded cluster. Because of this I am marking this ticket as 'features we are not sure of' and instead I plan on us giving more priority to the linked SERVER-31020 plus the 3.8 work to make unsharded collections tracked by the server, which would make the selection of a primary shard even less relevant.

Generated at Thu Feb 08 04:25:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.