[SERVER-10220] Support hashed fields in compound indexes and compound shard keys Created: 16/Jul/13 Updated: 06/Dec/22 Resolved: 23/Jan/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Sharding |
| Affects Version/s: | None |
| Fix Version/s: | 4.3.3 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | daniel.roberts@10gen.com | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Done | Votes: | 42 |
| Labels: | indexing, sharding | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||
| Description |
|
Provide ability to have hashed fields in compound indexes. For example:
Required for compound shard keys where one of the fields needs to be hashed for even distribution across the cluster. |
| Comments |
| Comment by Craig Homa [ 23/Jan/20 ] | |
|
Hey louisa.berger, this was done as part of the Compound Hashed Shard Key epic (PM-241), which will be included in 4.4. Please let the Query team know if you have any questions. | |
| Comment by Louisa Berger [ 23/Jan/20 ] | |
|
craig.homa Is this planning to be included in 4.4? | |
| Comment by John Page [ 15/Mar/18 ] | |
|
This also benefits from allow you to severely restrict the number of bits/range of values in the hash - so {userid:"Hashed:500"}allowing to hash between 0 and 499. This avoids the issues with random values in btrees blowing out your I/O and also allows active management of chunk moves when provisioning new servers. | |
| Comment by Adam Flynn [ 29/Jun/15 ] | |
|
This feature is high on our wishlist as well. We have a number of collections that naturally shard by a key like user ID (ObjectId). In many of these collections, the number of documents per user is typically small but technically unbounded (often monotonically growing). The largest/oldest users in these cases can create jumbo chunks. To prevent these rare jumbo chunk cases, we need to add more granularity to the shard key, say _id. But since most writes happen for new users, we need user_id to be hashed for even write distribution. So, our ideal shard key would be {user_id: "hashed", _id: "hashed"} or {user_id: "hashed", _id: 1} (limiting compound indexes to a single hashed key would be fine in this use case, since user_id has enough cardinality that _id won't materially impact write distribution). Right now, our workaround options are:
Putting this feature in MongoDB would let us side-step a lot of jumbo chunk problems without a lot of application overhead or write distribution issues. | |
| Comment by Gagan Jain [ 01/Jun/15 ] | |
|
Hi Mongo team, Any ETA on this? Thanks & regards, | |
| Comment by Nic Cottrell (Personal) [ 18/Aug/14 ] | |
|
I'd love to have this field. I have a collection which is a corpus of extracted sentences. I have a "t" field which is a long text (>512) which often contains Arabic etc. so too long to have a normal index (with the new 1024 hard limit) but also a "lc" (language code) field. It would save a lot of BSON processing if I could have an index on
|