[SERVER-10736] Modify MapReduce to "map, shuffle, reduce", and always take lists on the reducer input Created: 11/Sep/13 Updated: 06/Dec/22 Resolved: 04/Feb/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | MapReduce |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Nicolau Leal Werneck | Assignee: | Backlog - Query Execution |
| Resolution: | Done | Votes: | 1 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||
| Participants: | |||||||||||||
| Description |
|
The MapReduce command from MongoDB takes two non-optional functions, "map" and "reduce", and an optional "finalize" function. "reduce" is supposed to output the same data format from the "map" function. In some other frameworks, the functions are "map", "shuffle" and "reduce". "shuffle" is the one supposed to output the same data format from "map", just like the "reduce" from mongoDB, but it is "shuffle" that is the optional function, and the non-optional "reduce" is more like the "finalize" from MongoDB. "shuffle" is also known as "local reduce". It would be great if MongoDB could work like this instead, with the different nomenclature and optional parameters. Maybe changing the mapReduce method, or maybe creating a new method... Another interesting modification is to always deliver the data to the final step ("finalize"/"reduce") inside a list, even if there is just one item. This way we can always assume there is a list to process, and the method becomes simpler to write. It should also be easy to have an "identity reducer", it could be the default when no reducer is specified. Related tickets: |
| Comments |
| Comment by Esha Bhargava [ 04/Feb/22 ] |
|
Closing these tickets as part of the deprecation of mapReduce. |
| Comment by Rafael [ 24/Feb/14 ] |
|
A design that always deliver the data to the final step ("finalize"/"reduce") inside a list, even if there is just one item is a more robust long term solution. This way we can always assume there is a list to process, and the method becomes simpler to write. |
| Comment by Nicolau Leal Werneck [ 11/Sep/13 ] |
|
The title is obviously incorrect, it should be "...and always take lists on the reducer input". Also, I should note that we could maintain compatibility with current MongoDB by keep calling it "reduce" instead of "shuffle", and using it as "finalize" ("reduce" in the Bizarro World) if no "finalize" is specified. This is pretty much what happens today, but there would still be some changes: finalize (reduce) and reduce (shuffle) are not exactly optional or non-optional. It is only necessary to have at least one of them. The change is to allow us to have only "finalize" if we so desire. And if none of them is available, the output should be the identity reducer, a list of all values from each key. |