Core Server
  1. Core Server
  2. SERVER-699

Support other scripting languages (eg perl) for map/reduce

    Details

    • Type: New Feature New Feature
    • Status: Open Open
    • Priority: Major - P3 Major - P3
    • Resolution: Unresolved
    • Affects Version/s: None
    • Component/s: Usability
    • Labels:
      None
    • Backport:
      No
    • # Replies:
      10
    • Last comment by Customer:
      true

      Description

      It would be advantageous to be able to use other scripting languages in map/reduce tasks (for me, perl, though I could see python being a good fit too).

      This would allow developers to write map/reduce tasks more easily, and to allow them to access code and libraries in that language which might be advantageous during the map/reduce tasks.

        Activity

        Hide
        Eliot Horowitz
        added a comment -

        @mathieu @cyril we agree. we haven't gotten to it yet - but its definitely one of the things we want to support.
        first version will probably require you to manage binaries, and the api will be BSON in and out

        Show
        Eliot Horowitz
        added a comment - @mathieu @cyril we agree. we haven't gotten to it yet - but its definitely one of the things we want to support. first version will probably require you to manage binaries, and the api will be BSON in and out
        Hide
        Valery Khamenya
        added a comment -

        +1 to Mathieu Poumeyrol

        Show
        Valery Khamenya
        added a comment - +1 to Mathieu Poumeyrol
        Hide
        Paul Harvey
        added a comment -

        I can appreciate that this task may be a little open-ended, there are some interesting design decisions to make. Turning mongo into a full-blown distributed HPC platform might be asking too much. But we would really appreciate a streaming solution also - no matter how primitive.

        Although we will be storing raw data in mongodb, the system we are building is only able to exploit mongo for metadata (management of the raw data). As things currently stand, we either have to fund someone to re-work a precious few algorithms into mongo+m/r javascript (costly, unsustainable), relying on sharding to have any hope of reasonable CPU utilisation or alternatively we build an in-house API to bridge the raw data from mongob to an entirely separate distributed HPC framework.

        We work in bioinformatics - many problems fit embarrassingly well into map/reduce, but we rely heavily on libraries to the bulk of the work (python, perl, ruby - probably in that order - though people use things like R on their workstations)

        Show
        Paul Harvey
        added a comment - I can appreciate that this task may be a little open-ended, there are some interesting design decisions to make. Turning mongo into a full-blown distributed HPC platform might be asking too much. But we would really appreciate a streaming solution also - no matter how primitive. Although we will be storing raw data in mongodb, the system we are building is only able to exploit mongo for metadata (management of the raw data). As things currently stand, we either have to fund someone to re-work a precious few algorithms into mongo+m/r javascript (costly, unsustainable), relying on sharding to have any hope of reasonable CPU utilisation or alternatively we build an in-house API to bridge the raw data from mongob to an entirely separate distributed HPC framework. We work in bioinformatics - many problems fit embarrassingly well into map/reduce, but we rely heavily on libraries to the bulk of the work (python, perl, ruby - probably in that order - though people use things like R on their workstations)
        Hide
        Bobby J
        added a comment -

        Big priority for us. We chose to use mongodb partly because pymongo integrated so nicely into our python codebase. Now we find ourselves using hadoop for mapreduce jobs just so we can keep our mapper/reducer functionality in python. Thanks for looking into this!

        Show
        Bobby J
        added a comment - Big priority for us. We chose to use mongodb partly because pymongo integrated so nicely into our python codebase. Now we find ourselves using hadoop for mapreduce jobs just so we can keep our mapper/reducer functionality in python. Thanks for looking into this!
        Hide
        josh rabinowitz
        added a comment -

        I'm the original poster of this JIRA (not that I was the first to want support for other languages in m/r). It's been interesting to see how the conversation here has evolved.

        To add my $0.01: +1 to streaming solution. And BSON in/out sounds just fine.

        Show
        josh rabinowitz
        added a comment - I'm the original poster of this JIRA (not that I was the first to want support for other languages in m/r). It's been interesting to see how the conversation here has evolved. To add my $0.01: +1 to streaming solution. And BSON in/out sounds just fine.

          People

          • Votes:
            21 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

            • Created:
              Updated:
              Days since reply:
              3 years, 20 weeks, 6 days ago
              Date of 1st Reply: