[SERVER-31083] Allow passing primary shard to "enableSharding" command for a new database Created: 13/Sep/17 Updated: 30/Oct/23 Resolved: 06/Nov/19 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.1, 4.2.2, 4.0.14 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Asya Kamsky | Assignee: | Marcos José Grillo Ramirez |
| Resolution: | Fixed | Votes: | 4 |
| Labels: | ShardingRoughEdges, high-value, neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Backport Requested: |
v4.2, v4.0
|
||||||||||||||||||||
| Sprint: | Sharding 2019-11-04, Sharding 2019-11-18 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
Background and motivationBoth replica set and sharded cluster MongoDB installations support implicit database and collection creation. In a sharded cluster, by default, implicitly created databases do not support creating sharded collections under them and because of this, sharding provides the enableSharding command, which explicitly creates the database and marks it as permitting sharded collections. Currently, both implicitly created databases and those created through enableSharding (partially) use the balancer's statistics gathering logic to find the shard with the smallest data size and place the database's primary on it. We have seen pathological cases where multiple concurrent implicit database creations end-up placing all database primaries on the same shard. In addition, because the implicit database placement doesn't use the complete balancer placement logic, it also does not take into account zones, which may lead to database primaries violating location requirements such as GDPR. Proposed solutionExpose an optional string parameter called primaryShard on the enableSharding command. If this parameter is present, it must contain the id of a valid shard, and the new database's primary should be placed on that shard. If the database already exists and its current primary is the same as the one specified through primaryShard, the command succeeds. Otherwise, the command should fail with error code NamespaceExists = 48. If the parameter is omitted, the command should behave like it does currently and place the database's primary on the shard with the currently smallest data size. |
| Comments |
| Comment by Githook User [ 07/Nov/19 ] |
|
Author: {'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@10gen.com'}Message: (cherry picked from commit 6ef06c9093462bec22c2219c341b0219f1864cca) |
| Comment by Githook User [ 07/Nov/19 ] |
|
Author: {'name': 'Marcos José Grillo Ramírez', 'email': 'marcos.grillo@10gen.com'}Message: (cherry picked from commit 6ef06c9093462bec22c2219c341b0219f1864cca) |
| Comment by Githook User [ 06/Nov/19 ] |
|
Author: {'email': 'marcos.grillo@10gen.com', 'name': 'Marcos José Grillo Ramírez'}Message: |
| Comment by Kaloian Manassiev [ 28/Oct/19 ] |
|
For people who are following this ticket, I updated the description with the way we intend to implement this. |
| Comment by Andy Schwerin [ 25/Jan/18 ] |
|
The point of this ticket was to make sure the primary shard for a database ended up in the desired place. I think the target zone sent to enableSharding for a non-existent database would just ensure that the selected primary shard was in that zone. If the database already existed, I imagined that enableSharding would fail if the primary shard weren't in the zone. |
| Comment by Kaloian Manassiev [ 25/Jan/18 ] |
|
schwerin, this sounds like a better approach than passing the primary id. However, technically it is not databases that contain data, but the collections themselves. Assuming we start tracking unsharded collections like sharded, are you suggesting that we introduce something like a "default zone" for database, so that each new collection is created on a shard from the zone and if there are further per-collection zones, they take precedence? |
| Comment by Andy Schwerin [ 25/Jan/18 ] |
|
I think that we should get away from requesting specific shards, and instead if we do this, move towards requiring specific zones. How does that sound, kaloian.manassiev? |
| Comment by Asya Kamsky [ 24/Jan/18 ] |
|
Couldn't selection of shard as primary for DB be important for location/zone purposes? I suppose then the option should exist on "createDatabase" for cases where the database won't ever be sharded. |
| Comment by Kaloian Manassiev [ 21/Nov/17 ] |
|
The location of the data for a sharded database/collection should not be a concern of the application and it is the job of the sharding balancer logic to make that selection as efficient as possible. Introducing a knob like the one proposed runs the risk of introducing imbalance across the sharded cluster. Because of this I am marking this ticket as 'features we are not sure of' and instead I plan on us giving more priority to the linked SERVER-31020 plus the 3.8 work to make unsharded collections tracked by the server, which would make the selection of a primary shard even less relevant. |