<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 02:56:21 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-1197] Performance question regarding map/reduce: map reduce mongo fonction slower than naive (python) counting</title>
                <link>https://jira.mongodb.org/browse/SERVER-1197</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We coded the given map/reduce example (&lt;a href=&quot;http://api.mongodb.org/python/current/examples/map_reduce.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://api.mongodb.org/python/current/examples/map_reduce.html&lt;/a&gt;) directly in python and got much better performance (see attached script) ... did we get anything wrong?&lt;/p&gt;

&lt;p&gt;Usage of the script (will create sample data and time the two methods):&lt;/p&gt;

&lt;p&gt;python mongo_map_reduce_counter.py  test_db_name&lt;/p&gt;

&lt;p&gt;Example output (with nb_objects=5000, nb_tags=200, nb_bins=3):&lt;br/&gt;
$&amp;gt;python mongo_map_reduce_counter.py  test_sdsd&lt;br/&gt;
calc naive time 	0.317932844162&lt;br/&gt;
calc map_reduce time 	110.605533838&lt;br/&gt;
Same results? True&lt;/p&gt;</description>
                <environment>GNU/Linux Ubuntu (Lucid) mongodb-unstable package (version 20100604)</environment>
        <key id="12085">SERVER-1197</key>
            <summary>Performance question regarding map/reduce: map reduce mongo fonction slower than naive (python) counting</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="eliot">Eliot Horowitz</assignee>
                                    <reporter username="alain">Alan</reporter>
                        <labels>
                    </labels>
                <created>Sun, 6 Jun 2010 12:02:24 +0000</created>
                <updated>Wed, 14 Dec 2011 17:15:10 +0000</updated>
                            <resolved>Mon, 7 Jun 2010 16:31:26 +0000</resolved>
                                    <version>1.5.2</version>
                                                    <component>Performance</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="73090" author="eliot" created="Wed, 14 Dec 2011 17:15:10 +0000"  >&lt;p&gt;If you want to diagnose why your case is slow - can you open a new ticket with the map/reducde code and sample data.&lt;/p&gt;</comment>
                            <comment id="73088" author="sfnelson" created="Wed, 14 Dec 2011 17:11:05 +0000"  >&lt;p&gt;&lt;a href=&quot;http://shootout.alioth.debian.org/u32/javascript.php&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://shootout.alioth.debian.org/u32/javascript.php&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Nice try, but javascript (v8) is only 3x slower for the type of operations I&apos;m performing. Mongo specifically is adding truly massive overhead to map/reduce operations.&lt;/p&gt;

&lt;p&gt;The new aggregation framework is not adequate for the things I&apos;m doing (and I&apos;m sure many other users) - my map functions need to make decisions about documents based on non-trivial dependent properties. The aggregation framework will not be able to replace m/r; it&apos;s not sufficient, as you seem to be aware from your comments on the linked issue.&lt;/p&gt;

&lt;p&gt;Has anyone profiled mongo&apos;s map/reduce implementation to determine where the overhead is coming from?&lt;/p&gt;</comment>
                            <comment id="72978" author="eliot" created="Wed, 14 Dec 2011 06:16:24 +0000"  >&lt;p&gt;Javascript is much slower than java.&lt;br/&gt;
If it comes down to that - java will always win.&lt;br/&gt;
The new aggregation framework is the long term solution. &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-447&quot; title=&quot;new aggregation framework&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-447&quot;&gt;&lt;del&gt;SERVER-447&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="72825" author="sfnelson" created="Tue, 13 Dec 2011 18:46:17 +0000"  >&lt;p&gt;Why was this issue closed without the problem being corrected? I&apos;m using mongo (version 2.0) for my dissertation research. A mongodb map/reduce function runs between 10 and 100 times slower than a Java implementation which does the same thing.&lt;/p&gt;

&lt;p&gt;To test this I hand-coded a naive java implementation of map/reduce which maps by iterating over a collection, performing the map operation, then storing any emits in a temporary collection. I create an index for the temporary collection, then call reduce which iterates over the temporary collection finding keys, retrieves all entries for that key in batches, stores the result in a new table, and deletes all entries for that key before moving on. When the temporary collection is empty I&apos;m done.&lt;/p&gt;

&lt;p&gt;This naive approach took my map/reduce function operating on several hundred million documents from many days down to hours. I&apos;ve since written an implementation which uses an cache in java and sequential traversal of the temporary collection without deletes which takes it down by another factor of 10.&lt;/p&gt;

&lt;p&gt;Why is mongo&apos;s implementation so freaken slow? Are you loading an entirely new javascript VM for every application of map? Map/reduce&apos;s performance is completely at odds with the excellent performance of everything else.&lt;/p&gt;</comment>
                            <comment id="14615" author="alain" created="Tue, 8 Jun 2010 08:30:37 +0000"  >&lt;p&gt;Thanks for your answer. In the particular (and very simple cf. the source code) example the speed difference is really great! I guess we will stick to the &quot;naive&quot; python implementation for now &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="14607" author="eliot" created="Mon, 7 Jun 2010 16:31:26 +0000"  >&lt;p&gt;python may be faster than map/reduce for some cases.&lt;br/&gt;
we are going to be working on m/r performance later this year&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="10232" name="mongo_map_reduce_counter.py" size="1995" author="alain" created="Sun, 6 Jun 2010 12:02:24 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 7 Jun 2010 16:31:26 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        12 years, 10 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            12 years, 10 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>alain</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>sfnelson</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrjlhr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrik9j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>21935</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsa2mv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>