<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:07:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-4876] Map reduce with option &quot;replace&quot; is reducing instead</title>
                <link>https://jira.mongodb.org/browse/SERVER-4876</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;When using map reduce over a large collection (several millions of documents) and setting output to &quot;replace&quot; the replace is not really an atomic replacement, it seems to &quot;reduce&quot; on the output collection.&lt;/p&gt;

&lt;p&gt;I use a map reduce operation to find the duplicates (based on one field) in a sharded environement.&lt;br/&gt;
The input collection has several millions documents, the output also (they should have the same number of elements because there should not be any duplicates in theory).&lt;/p&gt;

&lt;p&gt;However if I relaunch the map reduce (using the replace output option from the mongodb shell), a lot of a false positive are found (~800 on 17 millions documents are counted twice).&lt;br/&gt;
If I drop the ouput collection before re-running the map reduce, no duplicates are found.&lt;/p&gt;

&lt;p&gt;function mapDoublonsSqlId() {&lt;br/&gt;
emit(&lt;/p&gt;
{p : this.partnerId, id : this.sqlId}
&lt;p&gt;, 1)&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;function reduceDoublonsSqlId(key,values) {&lt;br/&gt;
var total = 0;&lt;br/&gt;
values.forEach(function(o) &lt;/p&gt;
{ total+=o}
&lt;p&gt;)&lt;br/&gt;
return total;&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;db.runCommand({mapreduce : &quot;products&quot;, map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : &quot;tmp&quot;}})&lt;br/&gt;
db.tmp.count({value : {$gt : 1}}) //ok no duplicates&lt;/p&gt;

&lt;p&gt;db.runCommand({mapreduce : &quot;products&quot;, map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : &quot;tmp&quot;}})&lt;br/&gt;
db.tmp.count({value : {$gt : 1}}) //oho here is the issue, a lot of false duplicates are displayed&lt;/p&gt;

&lt;p&gt;db.tmp.drop()&lt;br/&gt;
db.runCommand({mapreduce : &quot;products&quot;, map : mapDoublonsSqlId, reduce : reduceDoublonsSqlId, out : {replace : &quot;tmp&quot;}})&lt;br/&gt;
db.tmp.count({value : {$gt : 1}}) //ok no duplicates any more&lt;/p&gt;


&lt;p&gt;It seems that the replace does not work as expected.&lt;/p&gt;</description>
                <environment>linux centos </environment>
        <key id="29966">SERVER-4876</key>
            <summary>Map reduce with option &quot;replace&quot; is reducing instead</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="antoine">Antoine Girbal</assignee>
                                    <reporter username="kamaradclimber">Gr&#233;goire Seux</reporter>
                        <labels>
                            <label>mapreduce</label>
                            <label>options</label>
                    </labels>
                <created>Mon, 6 Feb 2012 09:36:24 +0000</created>
                <updated>Wed, 15 Aug 2012 14:02:56 +0000</updated>
                            <resolved>Fri, 16 Mar 2012 07:56:07 +0000</resolved>
                                    <version>2.0.0</version>
                                                    <component>MapReduce</component>
                                        <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="99498" author="kamaradclimber" created="Fri, 16 Mar 2012 07:51:58 +0000"  >&lt;p&gt;no it does not happen anymore. You can close this ticket.&lt;/p&gt;</comment>
                            <comment id="99307" author="antoine" created="Thu, 15 Mar 2012 18:18:21 +0000"  >&lt;p&gt;are you still seeing this issue?&lt;br/&gt;
was it reproducible always?&lt;/p&gt;</comment>
                            <comment id="85879" author="antoine" created="Mon, 6 Feb 2012 23:16:56 +0000"  >&lt;p&gt;I tried but cannot reproduce this issue, with v2.0.2 and 200k docs sharded collection.&lt;br/&gt;
Could you give:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;exact version of all components you use (mongod, mongos, etc)&lt;/li&gt;
	&lt;li&gt;output of each MR job (should have stats)&lt;/li&gt;
	&lt;li&gt;db.tmp.stats() after each MR run&lt;/li&gt;
	&lt;li&gt;output of db.printShardingInfo()&lt;/li&gt;
	&lt;li&gt;does issue go away if you just set field like &apos;out: &quot;tmp&quot;&apos;&lt;/li&gt;
&lt;/ul&gt;
</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 6 Feb 2012 23:16:56 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        11 years, 48 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            11 years, 48 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>antoine</customfieldvalue>
            <customfieldvalue>kamaradclimber</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hroefz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hritw7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>23496</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|ht0a9b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>