Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-2340

MapReduce finalize should be able to throw result row away

    Details

    • Type: Improvement
    • Status: Open
    • Priority: Trivial - P5
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: planned but not scheduled
    • Component/s: MapReduce
    • Labels:
      None
    • Backwards Compatibility:
      Minor Change

      Description

      I have a use case where I would like to throw away a result in finalize phase withing map reduce. AFAIK currently finalize can only modify the result object but not remove it completely.

      My use case consists a map reduce where I first emit

      { count : 1 }

      in map phase and then I sum the counts together in reduce phase. Then I would like to discard all results which count is less than some value and return only those which count is greater than my requirement. In practice the finalize will discard 99.99% of my results away so it would be much more efficient to do it there instead of manually iterating or querying the result temp collection.

      I propose that returning a null in finalize phase would discard the result. Currently all examples of the finalize function will return the result object, so implementing this would not change the current behavior.

        Activity

        Hide
        garo Juho Mäkinen added a comment -

        Example of a finalize function which would fit the use case:

        f = function (key, value) {
        if (value.count > 100)

        { return value; }

        else

        { return null; }

        }

        Show
        garo Juho Mäkinen added a comment - Example of a finalize function which would fit the use case: f = function (key, value) { if (value.count > 100) { return value; } else { return null; } }
        Hide
        antoine Antoine Girbal (Inactive) added a comment -

        to use this feature, return null in finalize.
        M/R will omit that key from the final result.

        Show
        antoine Antoine Girbal (Inactive) added a comment - to use this feature, return null in finalize. M/R will omit that key from the final result.
        Hide
        auto auto (Inactive) added a comment -

        Author:

        {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

        Message: SERVER-2340: MapReduce finalize should be able to throw result row away
        Branch: master
        https://github.com/mongodb/mongo/commit/20a1c7ae3294f9e475adf2a0212726d2dc21b69b

        Show
        auto auto (Inactive) added a comment - Author: {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'} Message: SERVER-2340 : MapReduce finalize should be able to throw result row away Branch: master https://github.com/mongodb/mongo/commit/20a1c7ae3294f9e475adf2a0212726d2dc21b69b
        Hide
        auto auto (Inactive) added a comment -

        Author:

        {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

        Message: SERVER-2340: undoing until we decide on feature
        Branch: master
        https://github.com/mongodb/mongo/commit/2a8522c03f0259bbc9134a93b88bb00f5f2c58ce

        Show
        auto auto (Inactive) added a comment - Author: {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'} Message: SERVER-2340 : undoing until we decide on feature Branch: master https://github.com/mongodb/mongo/commit/2a8522c03f0259bbc9134a93b88bb00f5f2c58ce
        Hide
        antoine Antoine Girbal (Inactive) added a comment -

        we need to agree on feature 1st

        Show
        antoine Antoine Girbal (Inactive) added a comment - we need to agree on feature 1st
        Hide
        lalitagarw Lalit Agarwal added a comment -

        I am also facing the same issue. Any updates on this improvement?

        Show
        lalitagarw Lalit Agarwal added a comment - I am also facing the same issue. Any updates on this improvement?
        Hide
        jalava Jalmari Raippalinna added a comment -

        I need this feature in following use case:

        Due to performance requirements we are forced to use

        { inline: 1 }

        as output.

        We are reducing from huge data set to dataset that is over 16 megabytes (which is the BSON size limit), but 90% of data could be discarded on finalize by just returning null.

        Add option

        { discardNullOnFinalize: 1}

        if you are concerned that users might want to return null object with key still in result set.

        Show
        jalava Jalmari Raippalinna added a comment - I need this feature in following use case: Due to performance requirements we are forced to use { inline: 1 } as output. We are reducing from huge data set to dataset that is over 16 megabytes (which is the BSON size limit), but 90% of data could be discarded on finalize by just returning null. Add option { discardNullOnFinalize: 1} if you are concerned that users might want to return null object with key still in result set.
        Hide
        asya Asya Kamsky added a comment -

        This is easily done if your MR can be done as aggregation pipeline (as original example can be)...

        Show
        asya Asya Kamsky added a comment - This is easily done if your MR can be done as aggregation pipeline (as original example can be)...
        Hide
        maziyar Maziyar Panahi added a comment -

        I would also appreciate if it's possible to remove the key from the results at the finalize stage. One of the best things in finalize is to see if the results at the end meets our conditions. From millions of documents it will be reduced to couple of thousands which is a huge improvements on inserts and also further operations on the result_collection.

        If it was possible to use aggregation I would have definitely used it by now as I do for so many other things since it's faster, easier and more convenient but it is not as flexible as MR!

        Thanks,
        Maziyar

        Show
        maziyar Maziyar Panahi added a comment - I would also appreciate if it's possible to remove the key from the results at the finalize stage. One of the best things in finalize is to see if the results at the end meets our conditions. From millions of documents it will be reduced to couple of thousands which is a huge improvements on inserts and also further operations on the result_collection. If it was possible to use aggregation I would have definitely used it by now as I do for so many other things since it's faster, easier and more convenient but it is not as flexible as MR! Thanks, Maziyar

          People

          • Votes:
            10 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              21 hours, 39 minutes ago
              Date of 1st Reply: