[SERVER-20510] Add stored procedure functionality Created: 18/Sep/15 Updated: 06/Dec/22 Resolved: 14/Dec/17 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Usability |
| Affects Version/s: | 3.1.8 |
| Fix Version/s: | None |
| Type: | New Feature | Priority: | Major - P3 |
| Reporter: | Ian Beaver | Assignee: | Backlog - Query Team (Inactive) |
| Resolution: | Duplicate | Votes: | 9 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||
| Assigned Teams: |
Query
|
||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||
| Participants: | |||||||||||||||||
| Description |
|
With $eval removed Mongo needs to have some equivalent to stored procedures or it will be unusable for many use cases. M/R and the aggregation pipelines are not alternatives as they are not designed for individual document operations but group and aggregation operations. A very common use for stored procedures is cases where changes need to be made to a document that are dependent on existing data in the collection, and it is not efficient to do that outside of the DB. People against stored procedures typically do not want business logic separated across separate languages/locations, or dislike the additional complexity. While this may be valid in their environments, it does not negate the fact that for many use cases pulling all of the data out of the database to the client and writing it back in again with minor changes is not feasible. The additional complexity does not need to be used if their applications do not require it. A very common use case is you need to modify every document in a collection based on some attribute of each document. An example: each document has a rank score or other numerical attr of some kind, and you want to normalize that score based on the max value in the collection. To use client side code, as you recommend in $eval removal docs; means first selecting the max, then pulling down every single document or specific fields of each in the collection, unmarshalling into native objects, reading the score and dividing by the max, and writing each result back. We have many collections with tens to hundreds of millions of documents and do many similar operations with them. To do this client side is a non-starter with the latency and bandwidth needed to ship data back and forth between the client code and server. Running some basic tests with only ~100k documents this simple normalization operation is 10x-20x slower running client side with a co-located client than using $eval, not to mention all the additional bandwidth used for cases where client code cannot be co-located. This is in AWS with r3.8xlarge DB and SSD volumes. We re-wrote many of our applications to use $eval because of this very performance issue. If Mongo wants to position itself as a scaleable database for analytics, it has to provide some mechanism for executing arbitrary functions with document level write support on the data within the server, and ideally one that works with shards. It doesn't need to be JS and it doesn't have to be embedded in the storage engine, but even a streaming model like Hadoop where each node executes a script on its partition of the data just using stdin/stdout would be a start. Pulling all of the data out of the database over the network to update it with some minor changes is not a strategy to scale. One of the major wins of horizontal scaling is pushing the processing to the level the data lives and have that processing power scale with the storage. |
| Comments |
| Comment by Asya Kamsky [ 14/Dec/17 ] | |||||||||||||||||
|
The use case described here (updating documents based on other fields in the document) make this a duplicate of | |||||||||||||||||
| Comment by Asya Kamsky [ 06/Jun/17 ] | |||||||||||||||||
|
It would be better to have this discussion on MongoDB User Google Group - you can post this question there, be sure to include a sample document, but I can promise you it's doable with $graphLookup as we have users with exactly this use case. | |||||||||||||||||
| Comment by Max Müller [ 06/Jun/17 ] | |||||||||||||||||
|
Actually i try to solve this problem with $graphLookup, but the problem is the startWith and connectFromField parameter. We don't find a way to transfer an object to an array, especially because connectFromField have to be also just a field name... and the paramter just takes an string aka field name. To provide more information: Its a filesystem structure. We have an folder document, and this document has the children attribute (object) with the objectId of files or folders. Now we search for a way, to get all children from a specific folder. Any suggestions how to solve this? | |||||||||||||||||
| Comment by Asya Kamsky [ 06/Jun/17 ] | |||||||||||||||||
|
0x6a0x00 the linked SO question is old (2011) - now the same functionality can be achieved in the server with aggregation and $graphLookup stage. It'd be more helpful for our planning and prioritization if you described the problem that you cannot solve with MongoDB that having some sort of stored procedures would be the best solution for. | |||||||||||||||||
| Comment by Max Müller [ 06/Jun/17 ] | |||||||||||||||||
|
Same problem on our side. A stored procedure functionality would be nice. See https://stackoverflow.com/questions/44391819/mongodb-removes-eval-use-instead for more information. | |||||||||||||||||
| Comment by stone [X] [ 15/Nov/16 ] | |||||||||||||||||
|
I want this feature. | |||||||||||||||||
| Comment by Asya Kamsky [ 19/Apr/16 ] | |||||||||||||||||
|
All MongoDB drivers and mongo shell all will connect to primary if provided with replica set specification as host parameter. You don't need any of this logic, just connect to replsetname/ip:port,ip:port syntax. | |||||||||||||||||
| Comment by Michael Hernandez [ 19/Apr/16 ] | |||||||||||||||||
|
I have a very real use case for having some form of remote execution from within a shell. basically there are alot of cases where the primary is changing, and I need to be able to dynamically attach to any primary host, and run a set of commands.
| |||||||||||||||||
| Comment by Ian Beaver [ 11/Nov/15 ] | |||||||||||||||||
|
Update is not a solution, as you must use the original value of each document to compute the new value to call update with. Using the above example, you have documents with some integer attribute (say link clicks or something) and you want to normalize it. You cannot just call
| |||||||||||||||||
| Comment by Ramon Fernandez Marina [ 11/Nov/15 ] | |||||||||||||||||
|
ibeaver, I am somewhat confused about your use cases. You mention analytics, but also updating documents too. Updating every document in a collection can be done with an update() statement. However this is some times a sign of a non-optimal schema so I'd suggest you follow up on mongodb-user group or Stack Overflow with the mongodb tag for further discussion about your specific schema and use cases, and see if there are some easy improvements that can help on that front. That being said, I'm moving this ticket into "planned but not scheduled" so it gets considered in the next round of planning. Regards, |