<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:41:39 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-16605] Mapreduce into sharded collection with hashed index fails</title>
                <link>https://jira.mongodb.org/browse/SERVER-16605</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;When outputting from a map reduce job into a sharded output collection which features a hashed index on the _id field, no output is produced. The _id field is also the sharding key, so this issue &lt;/p&gt;

&lt;p&gt;Extensive testing shows that this happens &lt;b&gt;only&lt;/b&gt; for the first map reduce that is ever run on a MongoDB cluster. It fails to produce output and in the process, the name of the output collection appears to become &apos;cursed&apos; somehow: Any subsequent map-reduce job runs fail if that same output collection name is used.&lt;/p&gt;

&lt;p&gt;Even if the collection is re-created or the entire database is dropped and re-created, or if a different database is used. The name of the output collection can never be used again. Only when outputting into a collection with a different name, the exact same map reduce job processing the exact same data will succeed.&lt;/p&gt;

&lt;p&gt;The problem emerges on sharded clusters only, and only when the output collection uses a hashed index.&lt;/p&gt;

&lt;p&gt;It is possible to work around this problem by running a dummy map reduce job on newly setup MongoDB clusters, using an output collection that will never be used in regular operations.&lt;/p&gt;</description>
                <environment>Ubuntu 14.10&lt;br/&gt;
MongoDB packages from 10gen&lt;br/&gt;
PyMongo 2.7.1</environment>
        <key id="175483">SERVER-16605</key>
            <summary>Mapreduce into sharded collection with hashed index fails</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="dtakken">D.H.J. Takken</reporter>
                        <labels>
                            <label>open_todo_in_code</label>
                            <label>sharding</label>
                            <label>todo_in_code</label>
                    </labels>
                <created>Fri, 19 Dec 2014 13:47:05 +0000</created>
                <updated>Tue, 6 Dec 2022 04:57:54 +0000</updated>
                            <resolved>Thu, 13 Jun 2019 16:00:40 +0000</resolved>
                                    <version>2.6.5</version>
                    <version>2.7.8</version>
                                                    <component>MapReduce</component>
                    <component>Sharding</component>
                                        <votes>1</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="1885988" author="asya" created="Tue, 8 May 2018 15:58:34 +0000"  >&lt;p&gt;I think this is just an instance of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-14324&quot; title=&quot;MapReduce does not respect existing shard key on output:sharded&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-14324&quot;&gt;&lt;del&gt;SERVER-14324&lt;/del&gt;&lt;/a&gt; where output isn&apos;t using &lt;tt&gt;_id:&quot;hashed&quot;&lt;/tt&gt; but using &lt;tt&gt;_id:1&lt;/tt&gt; instead.&lt;/p&gt;</comment>
                            <comment id="1885978" author="asya" created="Tue, 8 May 2018 15:56:34 +0000"  >&lt;p&gt;I just verified that this is an issue if you are trying to output into a collection with shard key {_id:&quot;hashed&quot;}&lt;/p&gt;

&lt;p&gt;If the sharding of collection is changed to {_id:1} then it works as expected.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="1178701" author="ramon.fernandez" created="Fri, 19 Feb 2016 12:13:45 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=MosheKaplan&quot; class=&quot;user-hover&quot; rel=&quot;MosheKaplan&quot;&gt;MosheKaplan&lt;/a&gt;, please take a look at &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-17397&quot; title=&quot;Dropping a Database or Collection in a Sharded Cluster may not fully succeed&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-17397&quot;&gt;&lt;del&gt;SERVER-17397&lt;/del&gt;&lt;/a&gt;, which contains information on how to eliminate all traces of a database or collection from a sharded cluster. Hope that helps.&lt;/p&gt;</comment>
                            <comment id="1178644" author="moshekaplan" created="Fri, 19 Feb 2016 11:03:21 +0000"  >&lt;p&gt;We suffer from the same case at 3.0.3.&lt;br/&gt;
In our case, it seems that more than a single collection is being cursed, although newer tables are being created.&lt;br/&gt;
We tried to remove tables traces from the config servers (locks and collecitons tables in the config database), but it did not resolve this issue.&lt;br/&gt;
Can you point us to the location in the sharded cluster, where we can remove the &quot;cursed&quot; table name to recover the cluster?&lt;/p&gt;

&lt;p&gt;Thanks &lt;br/&gt;
Moshe&lt;/p&gt;</comment>
                            <comment id="918506" author="ramon.fernandez" created="Tue, 19 May 2015 22:28:11 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=dtakken&quot; class=&quot;user-hover&quot; rel=&quot;dtakken&quot;&gt;dtakken&lt;/a&gt;, looks like we let this ticket fall through the cracks &amp;#8211; very sorry about that.&lt;/p&gt;

&lt;p&gt;Thanks for the concise script, I&apos;m able to reproduce the behavior we describe and we&apos;re investigating. Once thing I&apos;ve noticed is that it seems to be the &quot;OutputCollectionA&quot; name the only one that doesn&apos;t work, not just the first one that&apos;s used. There&apos;s room for improvement in &lt;tt&gt;mapReduce&lt;/tt&gt; operations with output to sharded collections, but this looks like a very strange bug and I wonder if there are other names that fail in the same manner. Will post updates to this ticket as they become available.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;EDIT&lt;/b&gt;&lt;br/&gt;
After more thorough testing I can confirm that the issue appears with whatever collection name is used for the first &lt;tt&gt;mapReduce&lt;/tt&gt; operation as initially pointed out in this ticket; apologies for the confusion. I also observed that the behavior reproduces even if the hashed index is not created.&lt;/p&gt;
</comment>
                            <comment id="791916" author="dtakken" created="Mon, 22 Dec 2014 09:32:55 +0000"  >&lt;p&gt;Uploaded a new version of the test case, removing an index creation statement that is not relevant to the testcase.&lt;/p&gt;

&lt;p&gt;Also, I added the logging output of the testcase run on MongoDB 2.8 RC3.&lt;/p&gt;</comment>
                            <comment id="791915" author="dtakken" created="Mon, 22 Dec 2014 09:30:46 +0000"  >&lt;p&gt;I just tested on version 2.8 RC3 and the problem reproduces there as well.&lt;/p&gt;</comment>
                            <comment id="791358" author="dtakken" created="Fri, 19 Dec 2014 20:58:10 +0000"  >&lt;p&gt;That is correct. I use the mongodb-org and mongodb-org-unstable packages by Ernie Hershey. Packages for the 2.8 RC series have not appeared in the repository yet. &lt;/p&gt;</comment>
                            <comment id="791029" author="asya" created="Fri, 19 Dec 2014 18:01:15 +0000"  >&lt;p&gt;You&apos;d mentioned on mongodb-dev group that you can reproduce this on 2.6.5 and 2.7.x builds (2.8.0-rc?) is that correct?   I&apos;m double-checking since the 10gen packages are all 2.4 I believe.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="143230">SERVER-14324</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="935574">SERVER-43467</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="60215" name="log-2.8rc3.txt" size="40480" author="dtakken" created="Mon, 22 Dec 2014 09:28:33 +0000"/>
                            <attachment id="60104" name="log.txt" size="41189" author="dtakken" created="Fri, 19 Dec 2014 13:47:05 +0000"/>
                            <attachment id="60214" name="testcase.py" size="3068" author="dtakken" created="Mon, 22 Dec 2014 09:28:33 +0000"/>
                            <attachment id="60105" name="testcase.py" size="3179" author="dtakken" created="Fri, 19 Dec 2014 13:47:05 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 19 Dec 2014 18:01:15 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 40 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 40 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10020"><![CDATA[Linux]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>asya.kamsky@mongodb.com</customfieldvalue>
            <customfieldvalue>dtakken</customfieldvalue>
            <customfieldvalue>MosheKaplan</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlgbj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrfu2f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>153823</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;1. Instantiate a new, clean MongoDB cluster, featuring a single shard server, config server and mongos.&lt;br/&gt;
2. Create a new database, dropping it first if it exists already.&lt;br/&gt;
3. Create an input collection and an output collection. Both collections are sharded. The output collection has a hashed index on the _id field.&lt;br/&gt;
4. Run a simple map reduce job that gets its input from the input collection and outputs into the output collection.&lt;br/&gt;
5. All documents produced by the reducer in stage one of the map reduce process gets lost in the post processing stage. Output collection is empty.&lt;br/&gt;
6. Repeat steps 2,3 and 4 using an output collection having a &lt;b&gt;different&lt;/b&gt; name. The map reduce process succeeds this time.&lt;br/&gt;
7. Repeat steps 2,3 and 4 using an output collection having the same name as was used in the first map reduce job. It will fail again.&lt;/p&gt;

&lt;p&gt;(Python implementation of this test case is attached)&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsgdof:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>