Loading...

XML

Word

Printable

JSON

Type: New Feature
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: 3.1.8
Component/s: Usability
Labels:
None

Assigned Teams:

Query
Backwards Compatibility:
Fully Compatible
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

With $eval removed Mongo needs to have some equivalent to stored procedures or it will be unusable for many use cases. M/R and the aggregation pipelines are not alternatives as they are not designed for individual document operations but group and aggregation operations. A very common use for stored procedures is cases where changes need to be made to a document that are dependent on existing data in the collection, and it is not efficient to do that outside of the DB.

People against stored procedures typically do not want business logic separated across separate languages/locations, or dislike the additional complexity. While this may be valid in their environments, it does not negate the fact that for many use cases pulling all of the data out of the database to the client and writing it back in again with minor changes is not feasible. The additional complexity does not need to be used if their applications do not require it.

A very common use case is you need to modify every document in a collection based on some attribute of each document. An example: each document has a rank score or other numerical attr of some kind, and you want to normalize that score based on the max value in the collection. To use client side code, as you recommend in $eval removal docs; means first selecting the max, then pulling down every single document or specific fields of each in the collection, unmarshalling into native objects, reading the score and dividing by the max, and writing each result back. We have many collections with tens to hundreds of millions of documents and do many similar operations with them. To do this client side is a non-starter with the latency and bandwidth needed to ship data back and forth between the client code and server.

Running some basic tests with only ~100k documents this simple normalization operation is 10x-20x slower running client side with a co-located client than using $eval, not to mention all the additional bandwidth used for cases where client code cannot be co-located. This is in AWS with r3.8xlarge DB and SSD volumes. We re-wrote many of our applications to use $eval because of this very performance issue.

If Mongo wants to position itself as a scaleable database for analytics, it has to provide some mechanism for executing arbitrary functions with document level write support on the data within the server, and ideally one that works with shards. It doesn't need to be JS and it doesn't have to be embedded in the storage engine, but even a streaming model like Hadoop where each node executes a script on its partition of the data just using stdin/stdout would be a start. Pulling all of the data out of the database over the network to update it with some minor changes is not a strategy to scale. One of the major wins of horizontal scaling is pushing the processing to the level the data lives and have that processing power scale with the storage.

duplicates

SERVER-1765 self referential updates? WAS: allow access to old row value during update

Closed

related to

SERVER-11345 Allow update to compute expressions using referenced fields like Aggregation Framework's $project

Closed

Assignee:: Backlog - Query Team (Inactive)
Reporter:: Ian Beaver
Participants:: Asya Kamsky, Backlog - Query Team, Ian Beaver, Max Müller, Michael Hernandez, Ramon Fernandez Marina, stone [X]
Votes:: 9 Vote for this issue
Watchers:: 24 Start watching this issue

Created:: Sep 18 2015 09:33:40 PM UTC
Updated:: Dec 06 2022 04:44:03 AM UTC
Resolved:: Dec 14 2017 10:27:41 PM UTC

Details

Description

Attachments

Issue Links

Forms

Activity

People

Dates