<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:06:11 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-24375] Deduping in OR, SORT_MERGE, and IXSCAN (multikey case) uses unbounded memory</title>
                <link>https://jira.mongodb.org/browse/SERVER-24375</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Each of the stages listed in the title keeps a set of RecordIds; these are used to identify seen documents in order to ensure that we do not return the same document twice to the user. However, this requires memory proportional to the number of documents processed, and nothing is in place to ensure that we do not consume too much.  One example of how to reproduce this unbounded memory growth is given below.&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;40 M documents of the form 
{x:0, y:0}&lt;/li&gt;
	&lt;li&gt;index on 
{y:1}&lt;/li&gt;
	&lt;li&gt;query of the following form (full repro script attached)&lt;/li&gt;
&lt;/ul&gt;


&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;        q = {$or: [{x: 0, y: 0}, {x: 0, y: 0}]}&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;        db.c.find(q).hint({y: 1}).sort({z: -1}).limit(30).itcount()&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;Heap profile call tree shows memory usage by OrStage::work grow to about 1.5 GB as it scans the collection, then drop back to 0 at conclusion of query. Graph in each row shows memory usage for that node and its descendants; second number in each row is max memory in MB for that node.&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/124515/124515_heap-profile.png&quot; width=&quot;600&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="291127">SERVER-24375</key>
            <summary>Deduping in OR, SORT_MERGE, and IXSCAN (multikey case) uses unbounded memory</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-execution">Backlog - Query Execution</assignee>
                                    <reporter username="bruce.lucas@mongodb.com">Bruce Lucas</reporter>
                        <labels>
                            <label>query-44-grooming</label>
                    </labels>
                <created>Thu, 2 Jun 2016 18:26:57 +0000</created>
                <updated>Fri, 13 Oct 2023 13:42:21 +0000</updated>
                                            <version>3.0.12</version>
                                                    <component>Querying</component>
                                        <votes>5</votes>
                                    <watches>41</watches>
                                                                                                                <comments>
                            <comment id="4856772" author="kyle.suarez" created="Mon, 26 Sep 2022 20:49:28 +0000"  >&lt;p&gt;Sending this back to the Query Execution triage queue for re-discussion, and to come up with a SWAG estimate on how much work this would be to implement (in the classic engine). After adding the estimate, we can pass this to the Product team for re-prioritization.&lt;/p&gt;</comment>
                            <comment id="1981955" author="david.storch" created="Tue, 21 Aug 2018 22:50:36 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bruce.lucas&quot; class=&quot;user-hover&quot; rel=&quot;bruce.lucas&quot;&gt;bruce.lucas&lt;/a&gt; points out that there is an opportunity to bound the amount of memory used for deduping specifically for top-k SORT plans. Such plans need only retain the RecordIds of the current top k set for deduping purposes. Right now, the deduping is done by the underlying OR or IXSCAN stage beneath the SORT, which requires us to keep the RecordIds of all documents seen so far, not just the top k.&lt;/p&gt;</comment>
                            <comment id="1805562" author="nimesh.shah@wolterskluwer.com" created="Wed, 14 Feb 2018 15:23:57 +0000"  >&lt;p&gt;Is there any plan to fix this issue? if so, which version?&lt;/p&gt;</comment>
                            <comment id="1287848" author="david.storch" created="Wed, 8 Jun 2016 14:21:53 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bruce.lucas&quot; class=&quot;user-hover&quot; rel=&quot;bruce.lucas&quot;&gt;bruce.lucas&lt;/a&gt;, I think we should generalize this ticket to include anywhere that we keep a set of RecordIds for deduplication that can grow without bound during the execution of the query. IXSCAN and OR are the places that come to mind where we definitely do this.&lt;/p&gt;</comment>
                            <comment id="1287647" author="bruce.lucas@10gen.com" created="Wed, 8 Jun 2016 11:36:33 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=david.storch&quot; class=&quot;user-hover&quot; rel=&quot;david.storch&quot;&gt;david.storch&lt;/a&gt;, here&apos;s a heap profile collected by a customer for a similar but more complex query involving a $or stage. (The simplified query on this ticket came from an attempt to reproduce that customer&apos;s issue.) Note that memory is only indirectly allocated by OrStage::work, but is directly allocated instead by IndexScan::work. I think this means we either need to generalize this ticket to cover limiting memory allocated generally by the query engine, or open a separate ticket covering memory allocated by IndexScan::work - what do you think?&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/125326/125326_memory-growth-calltree.png&quot; width=&quot;600&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;</comment>
                            <comment id="1282772" author="david.storch" created="Thu, 2 Jun 2016 18:57:52 +0000"  >&lt;p&gt;The OR stage keeps a set of RecordIds of unbounded size in order to deduplicate results from its children.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                                                <inwardlinks description="is duplicated by">
                                        <issuelink>
            <issuekey id="570734">SERVER-36087</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="10248">SERVER-123</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="2473650">SERVER-82167</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="227969">SERVER-20239</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="591755">SERVER-36794</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="322216">SERVER-26534</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="124515" name="heap-profile.png" size="234083" author="bruce.lucas@mongodb.com" created="Thu, 2 Jun 2016 18:26:57 +0000"/>
                            <attachment id="125326" name="memory-growth-calltree.png" size="174198" author="bruce.lucas@mongodb.com" created="Wed, 8 Jun 2016 11:35:27 +0000"/>
                            <attachment id="124516" name="repro.sh" size="1199" author="bruce.lucas@mongodb.com" created="Thu, 2 Jun 2016 18:26:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25125"><![CDATA[Query Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000nB0c8QAC, 5002K00000nEECsQAO, 5002K00000r2uZHQAY, 5006R00001nKLaCQAW]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Thu, 2 Jun 2016 18:57:52 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        1 year, 19 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-2829</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>judah.schvimer@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            1 year, 19 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-execution</customfieldvalue>
            <customfieldvalue>bruce.lucas@mongodb.com</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
            <customfieldvalue>kyle.suarez@mongodb.com</customfieldvalue>
            <customfieldvalue>nimesh.shah@wolterskluwer.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrk6ov:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2ekv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5914">QE 2022-10-17</customfieldvalue>
    <customfieldvalue id="5916">QE 2022-10-31</customfieldvalue>
    <customfieldvalue id="5918">QE 2022-11-14</customfieldvalue>
    <customfieldvalue id="5920">QE 2022-11-28</customfieldvalue>
    <customfieldvalue id="5922">QE 2022-12-12</customfieldvalue>
    <customfieldvalue id="5924">QE 2022-12-26</customfieldvalue>
    <customfieldvalue id="5926">QE 2023-01-09</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsewmv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>