[SERVER-20510] Add stored procedure functionality Created: 18/Sep/15  Updated: 06/Dec/22  Resolved: 14/Dec/17

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: 3.1.8
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Ian Beaver Assignee: Backlog - Query Team (Inactive)
Resolution: Duplicate Votes: 9
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-1765 self referential updates? WAS: allow ... Closed
Related
related to SERVER-11345 Allow update to compute expressions u... Closed
Assigned Teams:
Query
Backwards Compatibility: Fully Compatible
Participants:

 Description   

With $eval removed Mongo needs to have some equivalent to stored procedures or it will be unusable for many use cases. M/R and the aggregation pipelines are not alternatives as they are not designed for individual document operations but group and aggregation operations. A very common use for stored procedures is cases where changes need to be made to a document that are dependent on existing data in the collection, and it is not efficient to do that outside of the DB.

People against stored procedures typically do not want business logic separated across separate languages/locations, or dislike the additional complexity. While this may be valid in their environments, it does not negate the fact that for many use cases pulling all of the data out of the database to the client and writing it back in again with minor changes is not feasible. The additional complexity does not need to be used if their applications do not require it.

A very common use case is you need to modify every document in a collection based on some attribute of each document. An example: each document has a rank score or other numerical attr of some kind, and you want to normalize that score based on the max value in the collection. To use client side code, as you recommend in $eval removal docs; means first selecting the max, then pulling down every single document or specific fields of each in the collection, unmarshalling into native objects, reading the score and dividing by the max, and writing each result back. We have many collections with tens to hundreds of millions of documents and do many similar operations with them. To do this client side is a non-starter with the latency and bandwidth needed to ship data back and forth between the client code and server.

Running some basic tests with only ~100k documents this simple normalization operation is 10x-20x slower running client side with a co-located client than using $eval, not to mention all the additional bandwidth used for cases where client code cannot be co-located. This is in AWS with r3.8xlarge DB and SSD volumes. We re-wrote many of our applications to use $eval because of this very performance issue.

If Mongo wants to position itself as a scaleable database for analytics, it has to provide some mechanism for executing arbitrary functions with document level write support on the data within the server, and ideally one that works with shards. It doesn't need to be JS and it doesn't have to be embedded in the storage engine, but even a streaming model like Hadoop where each node executes a script on its partition of the data just using stdin/stdout would be a start. Pulling all of the data out of the database over the network to update it with some minor changes is not a strategy to scale. One of the major wins of horizontal scaling is pushing the processing to the level the data lives and have that processing power scale with the storage.



 Comments   
Comment by Asya Kamsky [ 14/Dec/17 ]

The use case described here (updating documents based on other fields in the document) make this a duplicate of SERVER-1765 which describes more precisely that exact functionality.

Comment by Asya Kamsky [ 06/Jun/17 ]

It would be better to have this discussion on MongoDB User Google Group - you can post this question there, be sure to include a sample document, but I can promise you it's doable with $graphLookup as we have users with exactly this use case.

Comment by Max Müller [ 06/Jun/17 ]

Actually i try to solve this problem with $graphLookup, but the problem is the startWith and connectFromField parameter.
Because our recursion start (startWith and connectFromField) is an object:
Children:

{File_<fileId>: <fileId>, Folder_<id>: <id>}

We don't find a way to transfer an object to an array, especially because connectFromField have to be also just a field name... and the paramter just takes an string aka field name.

To provide more information: Its a filesystem structure. We have an folder document, and this document has the children attribute (object) with the objectId of files or folders. Now we search for a way, to get all children from a specific folder. Any suggestions how to solve this?

Comment by Asya Kamsky [ 06/Jun/17 ]

0x6a0x00 the linked SO question is old (2011) - now the same functionality can be achieved in the server with aggregation and $graphLookup stage. It'd be more helpful for our planning and prioritization if you described the problem that you cannot solve with MongoDB that having some sort of stored procedures would be the best solution for.

Comment by Max Müller [ 06/Jun/17 ]

Same problem on our side. A stored procedure functionality would be nice.

See https://stackoverflow.com/questions/44391819/mongodb-removes-eval-use-instead for more information.

Comment by stone [X] [ 15/Nov/16 ]

I want this feature.
as Ian Beaver said,when updating value based on original value,application would not
like to want to query first,do evaluation,then update.application want to do this in database side

Comment by Asya Kamsky [ 19/Apr/16 ]

All MongoDB drivers and mongo shell all will connect to primary if provided with replica set specification as host parameter.

You don't need any of this logic, just connect to replsetname/ip:port,ip:port syntax.

Comment by Michael Hernandez [ 19/Apr/16 ]

I have a very real use case for having some form of remote execution from within a shell.
Rolling Deployments: - I am not sure how other professionals are handling this but I tend to have things like this

basically there are alot of cases where the primary is changing, and I need to be able to dynamically attach to any primary host, and run a set of commands.
Given that mongo does not easily plug into ec2 or other cloud platforms tagging solutions, identifying a primary in an adhoc fashion leads to alot of interesting code.
The simplest fashion is to use a known mongo ip, to dynamically connect in-process to the master and evaluate the neccessary commands.

//bash pseudo
for each IP in known_mongo_ips:
  mongo ip /etc/my_mongo_startup.js
endfor
 
// my_mogo_startup.js
if (rs.isMaster()) {
  try {
   connect(rs.isMaster().primary + '/admin').eval('rs.stepDown()')
 }
 sleep(50000)
 connect(rs.isMaster().primary + '/admin').eval('rs.remove('+ rs.isMaster().me + ')')
  connect(rs.isMaster().primary + '/admin').eval('rs.add(' + newMongoIp + ')' )
}
 
 

Comment by Ian Beaver [ 11/Nov/15 ]

Update is not a solution, as you must use the original value of each document to compute the new value to call update with. Using the above example, you have documents with some integer attribute (say link clicks or something) and you want to normalize it. You cannot just call

update({{},{value:new},{multi:true}})

since each new value is document specific. You have to pull down each document value into client code, perform division of that value by the max, and then update that single document with its normalized value. There are many other use cases with databases where the update operation is dependent on a value of each row in a table (or attribute of a document). Pulling out every document from the database to do something so trivial is not viable as the collection size gets in the millions.

Comment by Ramon Fernandez Marina [ 11/Nov/15 ]

ibeaver, I am somewhat confused about your use cases. You mention analytics, but also updating documents too. Updating every document in a collection can be done with an update() statement. However this is some times a sign of a non-optimal schema so I'd suggest you follow up on mongodb-user group or Stack Overflow with the mongodb tag for further discussion about your specific schema and use cases, and see if there are some easy improvements that can help on that front.

That being said, I'm moving this ticket into "planned but not scheduled" so it gets considered in the next round of planning.

Regards,
Ramón.

Generated at Thu Feb 08 03:54:26 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.