[SERVER-61501] Create sharding suite where collections are clustered by default Created: 15/Nov/21 Updated: 29/Oct/23 Resolved: 27/Jan/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 5.3.0 |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Haley Connelly | Assignee: | Haley Connelly |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | PM-2311-M2 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Sprint: | Execution Team 2021-12-27, Execution Team 2022-01-10, Execution Team 2022-01-24, Execution Team 2022-02-07 | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
Similar to the passthrough suite, it would be nice to have a similar sharding suite. One issue in this could potentially be running into duplicate key errors if the collection is clustered by _id, but sharded on a different shardKey |
| Comments |
| Comment by Githook User [ 27/Jan/22 ] |
|
Author: {'name': 'Haley Connelly', 'email': 'haley.connelly@mongodb.com', 'username': 'haleyConnelly'}Message: |
| Comment by Haley Connelly [ 05/Jan/22 ] |
|
max.hirschhorn The goal of this was to take tests in jstests/sharding directory and modify them to be sharding on a clustered collection. These tests do exercise chunk migration, which is what we are aiming to test. For the duplicate key case, I was (not so clearly) referring to the issue that the cluster key must be unique - like a global unique index. We currently don't support unique indexes on a sharded collection when the collection is sharded on a different shard key. Edit: however - we don't currently enforce global _id uniqueness upon insert.
we don't error until trying to do chunk migration. Thus, we should be okay with just altering tests to create clustered collection by default. |
| Comment by Max Hirschhorn [ 20/Nov/21 ] |
The choice of {_id: "hashed"} is for a couple reasons:
Notably, the goal of pre-splitting and distributing the collection data across multiple shards is to exercise the targeting logic in mongos where some operations must broadcast to all shards and isn't to exercise chunk migration. In general, I skeptical of the amount of chunk migration which ever happens via the balancer in our sharding passthrough testing, including for the concurrency suites. I couldn't tell from the description what the interesting interaction(s) are between clustered collections and sharded collections for why an {_id: "hashed"} shard key pattern might be undesirable. |