<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:09:58 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-5818] reduce in map reduce doesn&apos;t run with only one input document</title>
                <link>https://jira.mongodb.org/browse/SERVER-5818</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;If you run a map reduce and only emit a single document, the reduce doesn&apos;t get run. At first it seems like this would make sense since there&apos;s nothing to reduce, but the reduce often changes the document format (gets counts, etc...) and therefore the output you get will be wrong for those operations.&lt;/p&gt;</description>
                <environment>centos 6</environment>
        <key id="38395">SERVER-5818</key>
            <summary>reduce in map reduce doesn&apos;t run with only one input document</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="brianjohnson">Brian Johnson</reporter>
                        <labels>
                    </labels>
                <created>Thu, 10 May 2012 19:47:24 +0000</created>
                <updated>Wed, 11 Sep 2013 13:13:46 +0000</updated>
                            <resolved>Thu, 10 May 2012 19:49:23 +0000</resolved>
                                    <version>2.0.4</version>
                                                    <component>MapReduce</component>
                                        <votes>0</votes>
                                    <watches>3</watches>
                                                                                                                <comments>
                            <comment id="118635" author="anisenbaum" created="Fri, 11 May 2012 16:46:54 +0000"  >&lt;p&gt;this seems like a bug. writing m/r results to an output collection skips writing documents where map emits one and only one key with one and only one value passed to reduce&lt;/p&gt;</comment>
                            <comment id="118427" author="brianjohnson" created="Thu, 10 May 2012 23:40:23 +0000"  >&lt;p&gt;so, I&apos;ve been thinking about it a lot, and I think you&apos;re right. If we make a change to our map function, we can make our reduce &quot;re-reducable&quot; and then we just need a finalize for the 1-(ab)/1-(cd) calculation&lt;/p&gt;</comment>
                            <comment id="118402" author="eliot" created="Thu, 10 May 2012 22:16:11 +0000"  >&lt;p&gt;For all of those cases you can use finalize.&lt;br/&gt;
For most cases, re-reducing is greatly superior.&lt;/p&gt;</comment>
                            <comment id="118401" author="brianjohnson" created="Thu, 10 May 2012 22:12:08 +0000"  >&lt;p&gt;semantics? I&apos;m talking about a class of problems that can&apos;t be solved using a re-reducable function. Now, I could see your point in a sharded environment where you might have to do a re-reduction on the sets returned by each shard, but again, there&apos;s not reason to require it. Even if it is sharded, you could potentially still avoid the issue depending on the map and your shard key choice. If you want to make the optimization, fine, but don&apos;t require it. Add a flag to turn it off. Why would you want to limit what your customers can do with your product? That just doesn&apos;t make any sense.&lt;/p&gt;</comment>
                            <comment id="118396" author="eliot" created="Thu, 10 May 2012 22:03:56 +0000"  >&lt;p&gt;re-reduce has massive performance and scalability benefits, as it allows for much tighter memory usage.&lt;br/&gt;
Given we want re-reduction for that, the optimization in the other case is irrelevant for semantics.&lt;/p&gt;
</comment>
                            <comment id="118393" author="brianjohnson" created="Thu, 10 May 2012 21:54:34 +0000"  >&lt;p&gt;reduces can be re-reducable, but there is no reason they have to be. Placing this restriction eliminates all kinds of useful things you can do with map-reduce. &lt;/p&gt;

&lt;p&gt;in the above case, the _id really should have been key, so the document would be&lt;/p&gt;

{key:&apos;val&apos;, etc...}

&lt;p&gt;and the map would produce&lt;/p&gt;

&lt;p&gt;{_id: &lt;/p&gt;
{key:&apos;val&apos;}
&lt;p&gt;, value: {count_a: etc...&lt;/p&gt;

&lt;p&gt;Even if you have some philosophical disagreement about what a reduce should be, that shouldn&apos;t limit what we can do with it. You can have the best of both worlds by adding a flag to the mapreduce function that allows you to turn off the &quot;optimization&quot; that was created in the other ticket. &lt;/p&gt;</comment>
                            <comment id="118374" author="eliot" created="Thu, 10 May 2012 21:16:31 +0000"  >&lt;p&gt;This doesn&apos;t work anyway if I understand as reduces can be reduced themselves.&lt;/p&gt;

&lt;p&gt;What is the map in the above example?  Don&apos;t see any map.&lt;/p&gt;

&lt;p&gt;lets say you have to build an array of all values for a key, you have to do it this way:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;emit( key , { values : [ value ] } }&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;reduce( k , values ) {&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;   all = []&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;   for ( var i=0; i&amp;lt;values.length; i++ ) {&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;      all = all.concat( values[i].values );&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;   }&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;   return { values : all } &lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;}&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;</comment>
                            <comment id="118356" author="brianjohnson" created="Thu, 10 May 2012 20:44:51 +0000"  >&lt;p&gt;here is a better example:&lt;/p&gt;

&lt;p&gt;input documents&lt;/p&gt;

&lt;p&gt;{_id: &apos;val&apos;, value: {count_a: 10, count_b: 20, count_c: 30, count_d: 40}}, {_id, &apos;val2&apos;, value: {count_a: 15, count_b: 20, count_c: 25, count_d: 50}}&lt;/p&gt;

&lt;p&gt;same map as above&lt;/p&gt;

&lt;p&gt;then reduce is&lt;/p&gt;

&lt;p&gt;function(key, values) {&lt;br/&gt;
   var ab = 1, cd = 1;&lt;/p&gt;

&lt;p&gt;   values.forEach(function(value)&lt;/p&gt;
{            
     ab *= 1 - (value.count_a/10*value.count_b)
     cd *= 1 - (value.count_c/10*value.count_d)
   }
&lt;p&gt;);&lt;br/&gt;
   return &lt;/p&gt;
{count: (1-ab)/(1-cd)}
&lt;p&gt;;&lt;br/&gt;
}&lt;/p&gt;

&lt;p&gt;We need to calculate this new value using all the existing value and that is the output we need. If we had a finalize, it would have to first check to see what format the document is and then do a partial calculation if the reduce wasn&apos;t run. Like I said, we can do it, but it&apos;s not very DRY.&lt;/p&gt;</comment>
                            <comment id="118354" author="brianjohnson" created="Thu, 10 May 2012 20:40:47 +0000"  >&lt;p&gt;I misunderstood the problem because as it turns out, we were outputting two different id values with one document each instead of one id value with two documents. So, we can make it work, but it means we need to do some checking in the finalize to see if the reduce has been run or not because we do calculations in the reduce that have to change the output format.&lt;/p&gt;</comment>
                            <comment id="118350" author="brianjohnson" created="Thu, 10 May 2012 20:32:57 +0000"  >&lt;p&gt;I think I have this switched around, but the principle is the same if we look at what we are actually doing instead of this simplified case. &lt;/p&gt;</comment>
                            <comment id="118347" author="brianjohnson" created="Thu, 10 May 2012 20:23:45 +0000"  >&lt;p&gt;We have a document collection with a set of values per month. We need to add up all the counts for each unique set of values per month. For instance:&lt;/p&gt;

{key: &apos;val&apos;, month: &apos;feb&apos;, users: 10}
&lt;p&gt;, &lt;/p&gt;
{key: &apos;val&apos;, month: &apos;jan&apos;, users: 20}
&lt;p&gt;, &lt;/p&gt;
{key: &apos;val2&apos;, month: &apos;feb&apos;, users: 30}

&lt;p&gt;the map will group everything by key so it will emit&lt;/p&gt;

&lt;p&gt;{_id: &lt;/p&gt;
{key: &apos;val&apos;}
&lt;p&gt;, value: {users: 10}}, {_id: &lt;/p&gt;
{key: &apos;val&apos;}
&lt;p&gt;, value: {users: 20}}, {_id: &lt;/p&gt;
{key: &apos;val2&apos;}
&lt;p&gt;, value: {users: 30}}&lt;/p&gt;

&lt;p&gt;and reduce to&lt;/p&gt;

&lt;p&gt;{_id: &lt;/p&gt;
{key: &apos;val&apos;}
&lt;p&gt;, value: {users: 30}}, {_id: &lt;/p&gt;
{key: &apos;val2&apos;}
&lt;p&gt;, value: {users: 30}}&lt;/p&gt;

&lt;p&gt;if you take away val2, we have no way to get aggregate counts for key=val because each document is sent to finalize individually and the reduce is never run.&lt;/p&gt;</comment>
                            <comment id="118341" author="eliot" created="Thu, 10 May 2012 20:14:49 +0000"  >&lt;p&gt;What&apos;s an example of something you can&apos;t do because of that?&lt;/p&gt;</comment>
                            <comment id="118332" author="eliot" created="Thu, 10 May 2012 20:03:15 +0000"  >&lt;p&gt;What do you mean exactly by aggregate count?&lt;br/&gt;
If you just want to count occurrences, you just sum the values in the reduce and don&apos;t need finalize.&lt;br/&gt;
Doesn&apos;t matter if its called 0 times.&lt;/p&gt;</comment>
                            <comment id="118330" author="brianjohnson" created="Thu, 10 May 2012 20:02:44 +0000"  >&lt;p&gt;I think the point again is that you are limiting the set of use cases for reduce. The &quot;fix&quot; applied in the other ticket severely limits what you can do for very limited gain in a small set of cases.&lt;/p&gt;</comment>
                            <comment id="118326" author="brianjohnson" created="Thu, 10 May 2012 20:00:53 +0000"  >&lt;p&gt;So explain to me how you would get an aggregate count for a single key. You can&apos;t do that in a finalize.&lt;/p&gt;</comment>
                            <comment id="118322" author="brianjohnson" created="Thu, 10 May 2012 19:53:41 +0000"  >&lt;p&gt;I noticed this ticket &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-2333&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/SERVER-2333&lt;/a&gt;&lt;br/&gt;
I think the thinking on that ticket is not correct. If you want to aggregate values for a given key, it won&apos;t work and this is exactly what we are trying to do.&lt;/p&gt;</comment>
                            <comment id="118317" author="eliot" created="Thu, 10 May 2012 19:49:23 +0000"  >&lt;p&gt;This is as design.&lt;br/&gt;
Reduce is not supposed to change the format as it can be re-reduced.&lt;br/&gt;
You should use finalize to change the structure.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="89303">SERVER-10736</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>17.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 10 May 2012 19:49:23 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        11 years, 40 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ian@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            11 years, 40 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>anisenbaum</customfieldvalue>
            <customfieldvalue>brianjohnson</customfieldvalue>
            <customfieldvalue>eliot</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hro39j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hriqvr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>23008</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs9zv3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>