<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:05:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-4364] Method for running large mapReduce operations in a working replica set</title>
                <link>https://jira.mongodb.org/browse/SERVER-4364</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;As a user of a MongoDB database served from an N-node replica set (running in --auth mode) that is responding to a steady trickle of read and write requests, I want to be able to complete a large map/reduce task without material disruption to the operation of the replica set.&lt;/p&gt;

&lt;p&gt;By &quot;large,&quot; I mean that I cannot guarantee the result set fits in a single 16MB document &amp;#8211; i.e., my task is a poor candidate for &quot;out: &lt;/p&gt;
{ inline: 1 }
&lt;p&gt;&quot;.&lt;/p&gt;

&lt;p&gt;By &quot;material disruption,&quot; I mean a variety of pathologies I&apos;d like to avoid if possible.  For example:  &lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;failover to the SECONDARY because the PRIMARY becomes unresponsive
	&lt;ul&gt;
		&lt;li&gt;(with successive failure of the mapReduce &amp;#8211; db assertion 13312 &amp;#8211; when it cannot commit its result collection because former PRIMARY is no longer PRIMARY)&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
	&lt;li&gt;error RS102 (&quot;too stale to catch up&quot;) on a secondary because of the deluge of a suddenly-appearing result set being replicated;&lt;/li&gt;
	&lt;li&gt;significant degradation of overall performance re: the latency of servicing other requests;&lt;/li&gt;
	&lt;li&gt;etc.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;For the sake of argument, let&apos;s further assume my map/reduce task is not amenable to an incremental application of successive mapReduce calls; and that I would potentially like to preserve the map/reduce output collection as a target for future queries, though it would be acceptable to have to manually move that result data around among hosts.&lt;/p&gt;

&lt;p&gt;I&apos;ve spoken with a couple members of the 10gen engineering team, and they&apos;ve been very helpful in brainstorming approaches and workarounds.  But for the sake of the product, I want to make sure the use case is captured here, as I don&apos;t think any of the approaches I&apos;m currently aware of really addresses it satisfactorily.&lt;/p&gt;


&lt;p&gt;Working with current constraints, here&apos;s where we get ... &lt;/p&gt;

&lt;p&gt;The &quot;large&quot; / no inline requirement today will force us to execute the mapReduce() call on the PRIMARY.  The reason being:  an output collection will have to be created, and only the PRIMARY is permitted to do these writes.&lt;/p&gt;

&lt;p&gt;But today, for a certain class of job, executing on the PRIMARY is a non-starter.  It will consume too many system resources, monopolize the one and only javascript thread, hold the write lock for extended periods of time, etc. &amp;#8211; risking running afoul of pathology #1 above.  If that failure mode is dodged, it runs the risk of pathology #2, as the large result set (created much faster than any more &quot;natural&quot; insertion of data) floods the SECONDARY nodes and overflows their oplogs.  And of course failure mode #3 is always a concern &amp;#8211; the PRIMARY is too critical to the overall responsiveness of any app that uses the database.&lt;/p&gt;

&lt;p&gt;That&apos;s the problem, in a nutshell.  Some potential, suggested solutions &amp;#8211; all of which involve changes to core server : &lt;/p&gt;

&lt;p&gt;A.  If we could relax or finesse the only-PRIMARY-can-write constraint for this kind of operation, we could run the mapReduce() call on a SECONDARY &amp;#8211; perhaps even a &quot;hidden&quot; node &amp;#8211; and leave the overall cluster and any apps almost completely unaffected.  Then we&apos;d take responsibility for getting the result collection out of that node as a separate task.  Currently, though, this does not seem possible &amp;#8211; cf. issue &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-4264&quot; title=&quot;add option to map/reduce output to output to the primary of a replica set or a different server&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-4264&quot;&gt;&lt;del&gt;SERVER-4264&lt;/del&gt;&lt;/a&gt; &amp;#8211; as we can&apos;t target even the &apos;local&apos; collection there.  (And even if we could target &apos;local&apos;, today it requires admin/separate privileges to do so.)&lt;/p&gt;

&lt;p&gt;B.  Alternately, if we could alter the mapReduce() command in such a way as to allow &quot;out:&quot; to target a remote database &amp;#8211; perhaps in an entirely disjoint replica set &amp;#8211; we could allow a secondary to run the calculation, save the result set, but still not violate the proscription against writing on a secondary.&lt;/p&gt;

&lt;p&gt;C.  Another possibility is to flag the result set for exclusion from replication &amp;#8211; though this leaves us running on the PRIMARY, where failure modes #1 and #3 are still an issue.&lt;/p&gt;

&lt;p&gt;My intent here, though, is not to describe or attach to any particular solution; rather, I&apos;m just seeking to articulate the problem standing in the way of (what I consider) a rather important use case.&lt;/p&gt;

&lt;p&gt;Thanks for reading this far!&lt;/p&gt;

&lt;p&gt;TD&lt;/p&gt;</description>
                <environment>replica set -- e.g., a PRIMARY, a SECONDARY, and an ARBITER -- receiving a steady trickle of read &amp;amp; write operations</environment>
        <key id="25415">SERVER-4364</key>
            <summary>Method for running large mapReduce operations in a working replica set</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="dampier">T. Dampier</reporter>
                        <labels>
                            <label>mapreduce,</label>
                            <label>replication</label>
                    </labels>
                <created>Wed, 23 Nov 2011 20:12:35 +0000</created>
                <updated>Tue, 6 Dec 2022 05:39:05 +0000</updated>
                            <resolved>Fri, 4 Feb 2022 15:09:31 +0000</resolved>
                                    <version>1.8.4</version>
                    <version>2.0.1</version>
                                                    <component>MapReduce</component>
                                        <votes>7</votes>
                                    <watches>13</watches>
                                                                                                                <comments>
                            <comment id="4335991" author="esha.bhargava" created="Fri, 4 Feb 2022 15:08:25 +0000"  >&lt;p&gt;Closing these tickets as part of the deprecation of mapReduce.&lt;/p&gt;</comment>
                            <comment id="69164" author="eliot" created="Mon, 28 Nov 2011 06:56:52 +0000"  >&lt;p&gt;Not sure we&apos;ll have a great solution in 2.2 for this - but we know its an issue and are working on some ideas.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 28 Nov 2011 06:56:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>esha.bhargava@mongodb.com</customfieldvalue>
            <customfieldvalue>dampier</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrmyyn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2cyn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>4824</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hszxlb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>