<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:07:22 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-4929] aggregation:  $group should have a $median accumulator</title>
                <link>https://jira.mongodb.org/browse/SERVER-4929</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;This has been suggested a couple of times through forums, meetups, and office hours.  $median would go with $avg, $min, $max, etc, as a $group expression.&lt;/p&gt;</description>
                <environment></environment>
        <key id="30378">SERVER-4929</key>
            <summary>aggregation:  $group should have a $median accumulator</summary>
                <type id="2" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14711&amp;avatarType=issuetype">New Feature</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-optimization">Backlog - Query Optimization</assignee>
                                    <reporter username="dan@mongodb.com">Daniel Pasette</reporter>
                        <labels>
                            <label>accumulator</label>
                            <label>expression</label>
                            <label>pm1457-nominee</label>
                    </labels>
                <created>Fri, 10 Feb 2012 20:14:16 +0000</created>
                <updated>Wed, 8 Nov 2023 11:40:55 +0000</updated>
                                                                            <component>Aggregation Framework</component>
                                        <votes>31</votes>
                                    <watches>35</watches>
                                                                                                                <comments>
                            <comment id="5863718" author="JIRAUSER1259115" created="Wed, 8 Nov 2023 11:40:55 +0000"  >&lt;p&gt;MongoDB Version 7 has a $median accumulator which is supported in $group&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=backlog-query-optimization&quot; class=&quot;user-hover&quot; rel=&quot;backlog-query-optimization&quot;&gt;backlog-query-optimization&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="4229583" author="mspreafico@gmail.com" created="Fri, 3 Dec 2021 15:22:55 +0000"  >&lt;p&gt;Would be very useful (and will also remove an issue opened almost ten years ago...)&lt;/p&gt;</comment>
                            <comment id="2509449" author="redbeard0531" created="Wed, 30 Oct 2019 23:58:07 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=charlie.swanson&quot; class=&quot;user-hover&quot; rel=&quot;charlie.swanson&quot;&gt;charlie.swanson&lt;/a&gt; sorry for the delay in replying, I just came across this again today. If we are OK with giving fast but imprecise results (which is somewhat implied if you are using &lt;tt&gt;$sample&lt;/tt&gt;), there are algorithms for on-line streaming computation of approximate percentiles and medians. The one I&apos;m familiar with is &#160;&lt;a href=&quot;https://www.cse.wustl.edu/~jain/papers/ftp/psqr.pdf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;P-squared&lt;/a&gt;&#160;which uses some dynamically moving markers and counts. There is a &lt;a href=&quot;https://www.boost.org/doc/libs/1_71_0/doc/html/accumulators/user_s_guide.html#accumulators.user_s_guide.the_statistical_accumulators_library.p_square_quantile&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;boost implementation&lt;/a&gt; and we even had one in &lt;a href=&quot;https://github.com/mongodb/mongo/blob/v2.6/src/mongo/util/descriptive_stats-inl.h&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;our codebase&lt;/a&gt; way back when, but it was removed since the feature it was built for was never productionized. We would need to modify the algorithm to provide some way to merge results from multiple shards, but IIRC that should be possible. There may also be newer algorithms that are designed with that in mind.&lt;/p&gt;</comment>
                            <comment id="1803699" author="massimo.brignoli" created="Mon, 12 Feb 2018 23:30:42 +0000"  >&lt;p&gt;Someone had the chance to have a look at tdigest? Seems quite compatible with Aggregation Framework and sharding...&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/CamDavidsonPilon/tdigest&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/CamDavidsonPilon/tdigest&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="1182040" author="charlie.swanson" created="Tue, 23 Feb 2016 17:11:49 +0000"  >&lt;p&gt;It&apos;s subtle, but &lt;tt&gt;$percentile&lt;/tt&gt; is actually very different from just another accumulator. Computing the Xth percentile requires looking at all the inputs (and sorting them?) before knowing the answer. This makes it just as memory-intensive as $push. It would also introduce some weird behavior in a sharded cluster. Since a $group would be split to run half on the shards, and half on the merging shard, the shards would have to send over their accumulated data. Everything goes over the wire as BSON, so the intermediate accumulated data could be at most 16MB. It would be surprising for some users to hit an excessive memory error on something that produces a pretty small output.&lt;/p&gt;

&lt;p&gt;As far as I can tell, &lt;tt&gt;$median&lt;/tt&gt; is just a special case of &lt;tt&gt;$percentile&lt;/tt&gt;, computing the 50th percentile.&lt;/p&gt;

&lt;p&gt;There may be an easy answer, which is just to introduce a &lt;tt&gt;$percentile&lt;/tt&gt; expression, which operates over an array. Then if you wanted to compute the median, you could do the following:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;db.example.aggregate([&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;  {$group: {_id: &lt;/span&gt;&lt;span style=&quot;color: blue; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;$group_id&quot;&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;, valuesOfInterest: {$push: &lt;/span&gt;&lt;span style=&quot;color: blue; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;$valueOfInterest&quot;&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;}}},&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;  {$project: {_id: 1, median: {$percentile: [&lt;/span&gt;&lt;span style=&quot;color: blue; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;$valuesOfInterest&quot;&lt;/span&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;, 50]}}}&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;])&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;This would make it explicit that you have to store all the data to compute the percentile. The downside is that there&apos;s probably a way to compute percentile as an accumulator that would use a smaller memory footprint (e.g. if there are a lot of duplicate values) that this would miss out on.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=redbeard0531&quot; class=&quot;user-hover&quot; rel=&quot;redbeard0531&quot;&gt;redbeard0531&lt;/a&gt;, do you have any thoughts?&lt;/p&gt;</comment>
                            <comment id="1031379" author="virendra.agarwal@timesinternet.in" created="Wed, 16 Sep 2015 07:31:45 +0000"  >&lt;p&gt;It will be good if Median comes with percentile option for Data.&lt;/p&gt;</comment>
                            <comment id="99753" author="cwestin" created="Fri, 16 Mar 2012 21:41:43 +0000"  >&lt;p&gt;Moved to &quot;Planning Bucket A&quot; to match the disposition of the duplicate &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-4571&quot; title=&quot;Create $median aggregation feature&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-4571&quot;&gt;&lt;del&gt;SERVER-4571&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                                                <inwardlinks description="is depended on by">
                                                        </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="27356">SERVER-4571</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="10927">SERVER-447</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25126"><![CDATA[Query Optimization]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 16 Mar 2012 21:41:43 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        13 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ilan.cohen@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            13 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-optimization</customfieldvalue>
            <customfieldvalue>charlie.swanson@mongodb.com</customfieldvalue>
            <customfieldvalue>cwestin</customfieldvalue>
            <customfieldvalue>dan@mongodb.com</customfieldvalue>
            <customfieldvalue>ilan.cohen@mongodb.com</customfieldvalue>
            <customfieldvalue>massimo.brignoli</customfieldvalue>
            <customfieldvalue>mathias@mongodb.com</customfieldvalue>
            <customfieldvalue>mspreafico@gmail.com</customfieldvalue>
            <customfieldvalue>virendra.agarwal@timesinternet.in</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrodsv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2gqv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7993</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrizxj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>