<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 06:27:02 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-74291] Investigate whether new SBE HashAggStage spilling algorithm should be improved to avoid random access into spill table</title>
                <link>https://jira.mongodb.org/browse/SERVER-74291</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-70395&quot; title=&quot;Slot-Based Engine too aggressively uses disk for $group and is slow&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-70395&quot;&gt;&lt;del&gt;SERVER-70395&lt;/del&gt;&lt;/a&gt; recently improved the performance of spilling in SBE&apos;s &lt;tt&gt;HashAggStage&lt;/tt&gt;&#160;by adopting the following algorithm. When the hash table exceeds its memory budget, the entire contents of the hash table are flushed to a &lt;tt&gt;TemporaryRecordStore&lt;/tt&gt; and the hash table itself is cleared. This may happen many times as the input data is consumed. Importantly, the &lt;tt&gt;TemporaryRecordStore&lt;/tt&gt; is sorted by the group-key, implemented by encoding the &lt;tt&gt;MaterializedRow&lt;/tt&gt; for the key into the record store&apos;s &lt;tt&gt;RecordId&lt;/tt&gt;. This means that once the data is consumed, there will be sequences of equal keys that are adjacent in the record store; a monotonically increasing counter is used to ensure that the {{RecordId}}s are unique. The partial aggregates can be merged to produce the final output using a single forwards pass over the spill table.&lt;/p&gt;

&lt;p&gt;While spilling to a table sorted by group-by key leads to some nice simplicity in the implementation, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=anna.wawrzyniak%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;anna.wawrzyniak@mongodb.com&quot;&gt;anna.wawrzyniak@mongodb.com&lt;/a&gt; pointed out that it could result in bad IO access patterns. In particular, each time we spill new data from the hash table to the &lt;tt&gt;TemporaryRecordStore&lt;/tt&gt;, we may need to write data to every page of the spill table.&lt;/p&gt;

&lt;p&gt;As an alternative, we could look into always appending the newly spilled data (sorted by key) to the end of the &lt;tt&gt;TemporaryRecordStore&lt;/tt&gt;. This would be similar to how spilling in &lt;tt&gt;DocumentSourceGroup&lt;/tt&gt; works &amp;#8211; it appends a new sorted segment to a spill file every time a spill event occurs. The benefit is that when we spill, we don&apos;t have to write new data to the pages that were written during a previous spill. When merging the partial aggregates, we would need to do a merge-sort of the spilled segments much like &lt;tt&gt;DocumentSourceGroup&lt;/tt&gt; does. Another consideration is that if there are too many spilled segments, we could have a merge tree with depth greater than 1 to avoid having to merge too many segments at once.&lt;/p&gt;</description>
                <environment></environment>
        <key id="2272601">SERVER-74291</key>
            <summary>Investigate whether new SBE HashAggStage spilling algorithm should be improved to avoid random access into spill table</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="1" iconUrl="https://jira.mongodb.org/images/icons/statuses/open.png" description="">Open</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-query-execution">Backlog - Query Execution</assignee>
                                    <reporter username="david.storch@mongodb.com">David Storch</reporter>
                        <labels>
                    </labels>
                <created>Wed, 22 Feb 2023 20:23:15 +0000</created>
                <updated>Tue, 14 Mar 2023 17:36:46 +0000</updated>
                                                                            <component>Query Execution</component>
                                        <votes>0</votes>
                                    <watches>5</watches>
                                                                                                                <comments>
                            <comment id="5222607" author="david.storch" created="Wed, 22 Feb 2023 20:32:39 +0000"  >&lt;p&gt;As part of the work on $group spilling performance in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-70395&quot; title=&quot;Slot-Based Engine too aggressively uses disk for $group and is slow&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-70395&quot;&gt;&lt;del&gt;SERVER-70395&lt;/del&gt;&lt;/a&gt;, I implemented a new &lt;a href=&quot;https://github.com/mongodb/genny/blob/212603dff436c6cdec80a263ee609a7e8b9fb4d9/src/workloads/query/GroupSpillToDisk.yml&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;GroupSpillToDisk.yml genny workload&lt;/a&gt;. We&apos;ve seen a substantial regression in the &lt;tt&gt;GroupSumAccumulatorSpillDueToManyGroups&lt;/tt&gt; query from this benchmark in SBE relative to the classic engine. It would be interesting to see if the change suggested by this ticket improves performance specifically for &lt;tt&gt;GroupSumAccumulatorSpillDueToManyGroups&lt;/tt&gt;.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="2155277">SERVER-70395</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25125"><![CDATA[Query Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        50 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-3243</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>mihai.andrei@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            50 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-query-execution</customfieldvalue>
            <customfieldvalue>david.storch@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i1xe4f:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i1ft7s:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i1x09r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>