[SERVER-25289] Make it possible to select a subset of documents based on the shard key Created: 26/Jul/16 Updated: 01/Aug/16 Resolved: 01/Aug/16 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Karolin Varner | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Participants: | |||||||||
| Description |
|
I recently had to distribute a batch job among multiple workers; the way I finally did it was computing the xxhash over the _id and storing that in an extra field (_id_hash) as a Long. Using this field, I could distribute the documents from the collection among N workers by computing the _id_hash % N and then having worker zero using modulus 0, worker 1 using modulus 1 and so on. I figured that mongodb would use a similar approach internally to compute the db shard to store a document on; I tried to find a way to reuse that mechanism but could not. Would it be possible to expose that capability of the shard key somehow? |
| Comments |
| Comment by Kelsey Schubert [ 01/Aug/16 ] |
|
Hi karo, Thank you for the feature suggestion. If I am understanding your use case correctly, SERVER-24274 would provide a command to partition data in a collection and will provide the functionality you are looking for. Please feel free to vote for SERVER-24274 and watch it for updates. In the meantime, you may want to consider using the command splitVector. This command's functionality is not currently exposed ( Kind regards, |
| Comment by Karolin Varner [ 26/Jul/16 ] |
|
Btw, I noticed while implementing my solution that $mod seems to ignore negative values; I had to add a special case for those and invert them. |