[SERVER-25289] Make it possible to select a subset of documents based on the shard key Created: 26/Jul/16  Updated: 01/Aug/16  Resolved: 01/Aug/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Karolin Varner Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-24274 Create a command to provide query bou... Backlog
Participants:

 Description   

I recently had to distribute a batch job among multiple workers; the way I finally did it was computing the xxhash over the _id and storing that in an extra field (_id_hash) as a Long.

Using this field, I could distribute the documents from the collection among N workers by computing the _id_hash % N and then having worker zero using modulus 0, worker 1 using modulus 1 and so on.

I figured that mongodb would use a similar approach internally to compute the db shard to store a document on; I tried to find a way to reuse that mechanism but could not.

Would it be possible to expose that capability of the shard key somehow?



 Comments   
Comment by Kelsey Schubert [ 01/Aug/16 ]

Hi karo,

Thank you for the feature suggestion. If I am understanding your use case correctly, SERVER-24274 would provide a command to partition data in a collection and will provide the functionality you are looking for. Please feel free to vote for SERVER-24274 and watch it for updates.

In the meantime, you may want to consider using the command splitVector. This command's functionality is not currently exposed (SERVER-10117). But you can see an example of its implementation in the Hadoop connector. Please note that if you choose pursue this approach there may be issues with very large datasets as the splitVector command cannot return more than 16MB of split points (SERVER-22571).

Kind regards,
Thomas

Comment by Karolin Varner [ 26/Jul/16 ]

Btw, I noticed while implementing my solution that $mod seems to ignore negative values; I had to add a special case for those and invert them.
Should I create a ticket for that too?

Generated at Thu Feb 08 04:08:47 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.