Core Server
  1. Core Server
  2. SERVER-2340

MapReduce finalize should be able to throw result row away

    Details

    • Backwards Compatibility:
      Minor Change
    • # Replies:
      8
    • Last comment by Customer:
      true

      Description

      I have a use case where I would like to throw away a result in finalize phase withing map reduce. AFAIK currently finalize can only modify the result object but not remove it completely.

      My use case consists a map reduce where I first emit

      { count : 1 }

      in map phase and then I sum the counts together in reduce phase. Then I would like to discard all results which count is less than some value and return only those which count is greater than my requirement. In practice the finalize will discard 99.99% of my results away so it would be much more efficient to do it there instead of manually iterating or querying the result temp collection.

      I propose that returning a null in finalize phase would discard the result. Currently all examples of the finalize function will return the result object, so implementing this would not change the current behavior.

        Activity

        Hide
        Juho Mäkinen
        added a comment -

        Example of a finalize function which would fit the use case:

        f = function (key, value) {
        if (value.count > 100)

        { return value; }

        else

        { return null; }

        }

        Show
        Juho Mäkinen
        added a comment - Example of a finalize function which would fit the use case: f = function (key, value) { if (value.count > 100) { return value; } else { return null; } }
        Hide
        Antoine Girbal (Inactive)
        added a comment -

        to use this feature, return null in finalize.
        M/R will omit that key from the final result.

        Show
        Antoine Girbal (Inactive)
        added a comment - to use this feature, return null in finalize. M/R will omit that key from the final result.
        Hide
        auto (Inactive)
        added a comment -

        Author:

        {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

        Message: SERVER-2340: MapReduce finalize should be able to throw result row away
        Branch: master
        https://github.com/mongodb/mongo/commit/20a1c7ae3294f9e475adf2a0212726d2dc21b69b

        Show
        auto (Inactive)
        added a comment - Author: {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'} Message: SERVER-2340 : MapReduce finalize should be able to throw result row away Branch: master https://github.com/mongodb/mongo/commit/20a1c7ae3294f9e475adf2a0212726d2dc21b69b
        Hide
        auto (Inactive)
        added a comment -

        Author:

        {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'}

        Message: SERVER-2340: undoing until we decide on feature
        Branch: master
        https://github.com/mongodb/mongo/commit/2a8522c03f0259bbc9134a93b88bb00f5f2c58ce

        Show
        auto (Inactive)
        added a comment - Author: {u'login': u'agirbal', u'name': u'agirbal', u'email': u'antoine@10gen.com'} Message: SERVER-2340 : undoing until we decide on feature Branch: master https://github.com/mongodb/mongo/commit/2a8522c03f0259bbc9134a93b88bb00f5f2c58ce
        Hide
        Antoine Girbal (Inactive)
        added a comment -

        we need to agree on feature 1st

        Show
        Antoine Girbal (Inactive)
        added a comment - we need to agree on feature 1st
        Hide
        Lalit Agarwal
        added a comment -

        I am also facing the same issue. Any updates on this improvement?

        Show
        Lalit Agarwal
        added a comment - I am also facing the same issue. Any updates on this improvement?
        Hide
        Jalmari Raippalinna
        added a comment -

        I need this feature in following use case:

        Due to performance requirements we are forced to use

        { inline: 1 }

        as output.

        We are reducing from huge data set to dataset that is over 16 megabytes (which is the BSON size limit), but 90% of data could be discarded on finalize by just returning null.

        Add option

        { discardNullOnFinalize: 1}

        if you are concerned that users might want to return null object with key still in result set.

        Show
        Jalmari Raippalinna
        added a comment - I need this feature in following use case: Due to performance requirements we are forced to use { inline: 1 } as output. We are reducing from huge data set to dataset that is over 16 megabytes (which is the BSON size limit), but 90% of data could be discarded on finalize by just returning null. Add option { discardNullOnFinalize: 1} if you are concerned that users might want to return null object with key still in result set.
        Hide
        Asya Kamsky
        added a comment -

        This is easily done if your MR can be done as aggregation pipeline (as original example can be)...

        Show
        Asya Kamsky
        added a comment - This is easily done if your MR can be done as aggregation pipeline (as original example can be)...

          People

          • Votes:
            9 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              17 weeks, 4 days ago
              Date of 1st Reply: