Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-10736

Modify MapReduce to "map, shuffle, reduce", and always take lists on the reducer input

    • Type: Icon: Improvement Improvement
    • Resolution: Done
    • Priority: Icon: Major - P3 Major - P3
    • None
    • Affects Version/s: None
    • Component/s: MapReduce
    • None
    • Query Execution

      The MapReduce command from MongoDB takes two non-optional functions, "map" and "reduce", and an optional "finalize" function. "reduce" is supposed to output the same data format from the "map" function.

      In some other frameworks, the functions are "map", "shuffle" and "reduce". "shuffle" is the one supposed to output the same data format from "map", just like the "reduce" from mongoDB, but it is "shuffle" that is the optional function, and the non-optional "reduce" is more like the "finalize" from MongoDB. "shuffle" is also known as "local reduce".

      It would be great if MongoDB could work like this instead, with the different nomenclature and optional parameters. Maybe changing the mapReduce method, or maybe creating a new method...

      Another interesting modification is to always deliver the data to the final step ("finalize"/"reduce") inside a list, even if there is just one item. This way we can always assume there is a list to process, and the method becomes simpler to write.

      It should also be easy to have an "identity reducer", it could be the default when no reducer is specified.

      Related tickets:

            backlog-query-execution [DO NOT USE] Backlog - Query Execution
            nwerneck Nicolau Leal Werneck
            1 Vote for this issue
            6 Start watching this issue