<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 02:54:54 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-699] Support other scripting languages (eg perl) for map/reduce</title>
                <link>https://jira.mongodb.org/browse/SERVER-699</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;It would be advantageous to be able to use other scripting languages in map/reduce tasks (for me, perl, though I could see python being a good fit too).&lt;/p&gt;

&lt;p&gt;This would allow developers to write map/reduce tasks more easily,  and to allow them to access code and libraries in that language which might be advantageous during the map/reduce tasks.&lt;/p&gt;</description>
                <environment></environment>
        <key id="11422">SERVER-699</key>
            <summary>Support other scripting languages (eg perl) for map/reduce</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="1" iconUrl="https://jira.mongodb.org/images/icons/statuses/open.png" description="">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="joshr">josh rabinowitz</reporter>
                        <labels>
                            <label>map-reduce</label>
                    </labels>
                <created>Thu, 4 Mar 2010 11:55:22 +0000</created>
                <updated>Tue, 6 Dec 2022 05:50:38 +0000</updated>
                                                            <fixVersion>features we&amp;#39;re not sure of</fixVersion>
                                    <component>Usability</component>
                                        <votes>21</votes>
                                    <watches>20</watches>
                                                                                                                <comments>
                            <comment id="20964" author="joshr" created="Thu, 2 Dec 2010 00:56:16 +0000"  >&lt;p&gt;I&apos;m the original poster of this JIRA (not that I was the first to want support for other languages in m/r). It&apos;s been interesting to see how the conversation here has evolved.&lt;/p&gt;

&lt;p&gt;To add my $0.01:  +1 to streaming solution. And BSON in/out sounds just fine.&lt;/p&gt;</comment>
                            <comment id="20954" author="bobbyj" created="Wed, 1 Dec 2010 23:53:13 +0000"  >&lt;p&gt;Big priority for us.  We chose to use mongodb partly because pymongo integrated so nicely into our python codebase.  Now we find ourselves using hadoop for mapreduce jobs just so we can keep our mapper/reducer functionality in python.  Thanks for looking into this!&lt;/p&gt;</comment>
                            <comment id="19247" author="csirac2" created="Sat, 16 Oct 2010 01:06:29 +0000"  >&lt;p&gt;I can appreciate that this task may be a little open-ended, there are some interesting design decisions to make. Turning mongo into a full-blown distributed HPC platform might be asking too much. But we would really appreciate a streaming solution also - no matter how primitive.&lt;/p&gt;

&lt;p&gt;Although we will be storing raw data in mongodb, the system we are building is only able to exploit mongo for metadata (management of the raw data). As things currently stand, we either have to fund someone to re-work a precious few algorithms into mongo+m/r javascript (costly, unsustainable), relying on sharding to have any hope of reasonable CPU utilisation or alternatively we build an in-house API to bridge the raw data from mongob to an entirely separate distributed HPC framework.&lt;/p&gt;

&lt;p&gt;We work in bioinformatics - many problems fit embarrassingly well into map/reduce, but we rely heavily on libraries to the bulk of the work (python, perl, ruby - probably in that order - though people use things like R on their workstations)&lt;/p&gt;</comment>
                            <comment id="18442" author="vak" created="Wed, 22 Sep 2010 18:00:41 +0000"  >&lt;p&gt;+1 to Mathieu Poumeyrol&lt;/p&gt;</comment>
                            <comment id="14934" author="eliot" created="Tue, 22 Jun 2010 05:23:58 +0000"  >&lt;p&gt;@mathieu @cyril  we agree.  we haven&apos;t gotten to it yet - but its definitely one of the things we want to support.&lt;br/&gt;
first version will probably require you to manage binaries, and the api will be BSON in and out&lt;/p&gt;</comment>
                            <comment id="14933" author="shingara" created="Tue, 22 Jun 2010 05:21:57 +0000"  >&lt;p&gt;I totaly agree with Mathieu. The streaming solution is really good solution to use what we want to do for map/reduce. In certain way, it can help us to made a multi-threading map/reduce function because it&apos;s our program to be multi-thread, not MongoDB.&lt;/p&gt;</comment>
                            <comment id="14931" author="kali" created="Tue, 22 Jun 2010 03:18:57 +0000"  >&lt;p&gt;I had a conversation at MongoFR with Matthias, and I think it would be a good place to followup.&lt;/p&gt;

&lt;p&gt;I think we need to have something similar to hadoop streaming. The principle is simple, each mapper starts an external process with a command specified by the user, push each document on the process STDIN, get each emitted value on its STDOUT. And same for reducer.&lt;/p&gt;

&lt;p&gt;That would give support in m/r for any language that can read and write json and/or bson, instead of having to pick one or a few languages that will let most users frustrated and require more and more heavy code maintenance and complex dependencies&lt;/p&gt;

&lt;p&gt;The nice point with this approach is also it is very easy to simulate map reduce using unix pipes in a development environment.&lt;/p&gt;

&lt;p&gt;Matthias expressed concern about that feature allowing arbitrary code execution on the server, but that is a risk that can be mitigated : we may want to limit it to some directory where the admin put the scripts, or even a more defined list, or have a map/reduce worker running as nobody... but that would seriously make the installation more difficult.&lt;/p&gt;

&lt;p&gt;As far as I&apos;m concerned, I&apos;d prefer the server to let me do whatever I want, with the user mongo is actually running. My data, my responsibility.&lt;/p&gt;

&lt;p&gt;For your information, hadoop also manages code transport from wherever the job is launched to the various nodes.&lt;/p&gt;

&lt;p&gt;The use case I&apos;m investigating is log analysis, I would love to get all my logs into mongo to support real time collection, long term storage, massive analysis and pinpoint debugging. But in massive analysis, streaming is an absolute must.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://wiki.apache.org/hadoop/HadoopStreaming&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://wiki.apache.org/hadoop/HadoopStreaming&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="13673" author="evanwies" created="Thu, 15 Apr 2010 15:32:06 +0000"  >&lt;p&gt;Some notes on Lua.  Lua is really fast and designed to be embedded.  &lt;/p&gt;

&lt;p&gt;mstearn said that preserving ordering of keys is important.  Lua doesn&apos;t do that.  There was a patch in January 2010 that does that (&lt;a href=&quot;http://lua-users.org/lists/lua-l/2010-01/msg00199.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://lua-users.org/lists/lua-l/2010-01/msg00199.html&lt;/a&gt;).  Since it is a patch rather than a library, it would only be feasible for the server.  Clients can&apos;t be expected to use patched VM.&lt;/p&gt;

&lt;p&gt;Lua has an awesome JIT (&lt;a href=&quot;http://www.luajit.org&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.luajit.org&lt;/a&gt;).  It would make a lot of sense for map/reduce.  You&apos;d need to port the ordered  patch to it.   You&apos;d also want to wait for the Foreign Function Interface (FFI) or raw struct access to be added.&lt;/p&gt;

&lt;p&gt;Notes on syntax:  Lua table is a little cleaner syntax than pure JSON.&lt;br/&gt;
pure JSON:  &lt;/p&gt;
{ &quot;name&quot; : &quot;mongo&quot;, &quot;type&quot; : &quot;db&quot; }
&lt;p&gt;Lua: &lt;/p&gt;
{ name = &quot;mongo&quot;, type = &quot;db&quot; }

&lt;p&gt;Since the native (albeit configurable) data type in Lua is a double, there are issues storing 64-bit integers.  There are various solutions for this which are web-searchable.&lt;/p&gt;</comment>
                            <comment id="13438" author="bpoweski" created="Mon, 5 Apr 2010 18:11:18 +0000"  >&lt;p&gt;For us perl with all of its complexity, and well Perlisms, would a less desirable language.  I think Lua would be a natural fit.&lt;/p&gt;</comment>
                            <comment id="12746" author="eliot" created="Thu, 4 Mar 2010 20:53:37 +0000"  >&lt;p&gt;agree it would be nice - but non-trivial&lt;br/&gt;
we would either have to embed each one, or provide a general process io version of map/reduce&lt;br/&gt;
unclear how well that would work, but perhaps&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>10.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 4 Mar 2010 20:53:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        13 years, 12 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            13 years, 12 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>bpoweski</customfieldvalue>
            <customfieldvalue>bobbyj</customfieldvalue>
            <customfieldvalue>shingara</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>evanwies</customfieldvalue>
            <customfieldvalue>joshr</customfieldvalue>
            <customfieldvalue>kali</customfieldvalue>
            <customfieldvalue>csirac2</customfieldvalue>
            <customfieldvalue>vak</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrpr5r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2j47:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6642</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrirfz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>