[SERVER-10117] expose splitVector functionality Created: 06/Jul/13 Updated: 06/Dec/22 Resolved: 23/Aug/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Sharding, Usability |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Antoine Girbal | Assignee: | [DO NOT USE] Backlog - Sharding Team |
| Resolution: | Won't Fix | Votes: | 8 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Sharding
|
||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Description |
|
splitVector is used by sharding and returns the split points for a collection. It shows pretty impressive performance while doing so. There are many cases where it can be extremely useful to know to split points of a collection, for example:
There is no easy alternative for the application in case it is not aware of the distribution of a key. |
| Comments |
| Comment by Kaloian Manassiev [ 23/Aug/18 ] |
|
Thanks ross.lawley. I am closing this ticket as Won't Fix. |
| Comment by Ross Lawley [ 23/Aug/18 ] |
|
kaloian.manassiev no longer a pain point. $sample does the job well and has been the default approach in the Spark Connector collection partitioner without issue. |
| Comment by Kaloian Manassiev [ 23/Aug/18 ] |
|
ross.lawley, is this still a pain point for you? Looks like $sample does the job and we would really like to not start supporting splitVector as first class citizen. |
| Comment by Luke Lovett [ 25/May/16 ] |
|
dan@10gen.com, this sounds like a good alternative for users on MongoDB 3.2+. I'll make a HADOOP ticket for creating a new splitter based on $sample. |
| Comment by Ross Lawley [ 25/May/16 ] |
|
dan@10gen.com I'm still trying to fully grok the code, it should work. I'll try creating a new partitioner using $sample and see how it goes. |
| Comment by Daniel Pasette (Inactive) [ 25/May/16 ] |
|
ross.lawley/luke.lovett, what if we move the hadoop and spark connectors to use the $sample agg stage to calculate split points instead of the internal splitVector cmd? This requires only read privileges and is exactly what we use to calculate split points in the oplog on WiredTiger. Here's the code: https://github.com/mongodb/mongo/blob/r3.3.6/src/mongo/db/storage/wiredtiger/wiredtiger_record_store.cpp#L343. Would this work? $sample is only available in v3.2+ |
| Comment by Ross Lawley [ 24/May/16 ] |
|
splitVector is also used by the new spark connector as a mechanism for partitioning a collection for any users on a non-sharded system. |
| Comment by Ben McCann [ 29/Jun/15 ] |
|
+1 to adding splitVector to the clusterMonitor role. It's already exposed in the clusterAdmin role and other admin roles. I feel that the only thing that's being accomplished by leaving it off of clusterMonitor is encouraging folks to assign higher permission levels than is necessary. You already require lots of customers using your Mongo Hadoop Connector to assign the permission. You can leave it undocumented so that folks aren't encouraged to use it beyond that, but it's seems really silly to tell people that they need to use it and then make it so that the easiest way to do that is by assigning a less secure role than is necessary. In fact, by not adding it in clusterMonitor you end up exposing it even more. Now people are creating custom roles that use splitVector instead of just using the built-in roles that you control and can change with future releases. To see where the Mongo Hadoop Connector uses it look at https://github.com/mongodb/mongo-hadoop/blob/master/core/src/main/java/com/mongodb/hadoop/splitter/StandaloneMongoSplitter.java |
| Comment by Lars Francke [ 17/Jul/14 ] |
|
I can open a separate issue for this but I think it's also worthwhile to allow splitVector to run on secondary servers as well. Especially for the Hadoop use-case. We don't want to point the customer's Hadoop cluster at the primary node. See CS-13607 and HADOOP-150 for more information. |
| Comment by Spencer Brody (Inactive) [ 08/Jul/13 ] |
|
Updated the title of this ticket and put it into needs triage. Not sure how important it is to change what system role this is in given the upcoming ability for users to define custom roles. |
| Comment by Antoine Girbal [ 08/Jul/13 ] |
|
I will open a ticket to document it then, because right now there is nothing: I think clusterAdmin role is too harsh here, since:
One point of this ticket is to make it available to application's 'read' or 'readWrite' users. |
| Comment by Spencer Brody (Inactive) [ 08/Jul/13 ] |
|
It is officially supported, you need the "clusterAdmin" role to use. |