<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:51:14 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-60983] Evaluate the performance of the new way of filtering writes to orphaned documents</title>
                <link>https://jira.mongodb.org/browse/SERVER-60983</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The goal of this task is to evaluate the performance impact of the new way of filtering writes on orphaned documents as part of PM-2423.&lt;/p&gt;

&lt;p&gt;The first task is to check if we have a benchmark already measuring the throughput of writes. At the end what we want to measure is the overhead introduced by this new way of filtering writes compared to the previous implementation.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;1st workload: without orphaned documents&lt;/b&gt;&lt;br/&gt;
 The goal of this benchmark is to measure the overhead of checking if the document is owned by the shard in a scenario in which there are no orphaned documents. We should test different scenarios:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Targeted writes (i.e. targeting just one shard, with a valid shard version). This scenario is very interesting because we believe that filtering via the the &lt;tt&gt;ShardVersion&lt;/tt&gt; should be enough.&lt;/li&gt;
	&lt;li&gt;Broadcast multi writes (i.e. &lt;tt&gt;ChunkVersion::IGNORED()&lt;/tt&gt;).&lt;/li&gt;
	&lt;li&gt;Others? broadcast write on a txn? Direct writes to a shard? My feeling is that with the previous two should be enough but I am open to evaluate other scenarios if we think it could be interesting.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;b&gt;2nd workload: with&lt;/b&gt; &lt;b&gt;orphaned documents&lt;/b&gt;&lt;br/&gt;
 The first workload is just evaluating the cost of checking the ownership of a document. The goal of this second workload is to evaluate also the cost of skipping a document. TBH I am still thinking on how to measure this, one idea I have is to create a sharded collection with only orphaned documents (created by direct writes to the shard!). Then all writes will be filtered out, so if we create the same number of documents on both workloads (without and with orphaned documents) the difference of the write times will be the overhead of just skipping. Open to other ideas &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/biggrin.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1909393">SERVER-60983</key>
            <summary>Evaluate the performance of the new way of filtering writes to orphaned documents</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="antonio.fuschetto@mongodb.com">Antonio Fuschetto</assignee>
                                    <reporter username="sergi.mateo-bellido@mongodb.com">Sergi Mateo Bellido</reporter>
                        <labels>
                    </labels>
                <created>Tue, 26 Oct 2021 07:50:27 +0000</created>
                <updated>Thu, 30 Dec 2021 16:40:44 +0000</updated>
                            <resolved>Thu, 30 Dec 2021 15:55:59 +0000</resolved>
                                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>2</watches>
                                                                                                                <comments>
                            <comment id="4269549" author="JIRAUSER1259062" created="Wed, 29 Dec 2021 09:24:56 +0000"  >&lt;h1&gt;&lt;a name=&quot;Introduction&quot;&gt;&lt;/a&gt;Introduction&lt;/h1&gt;

&lt;p&gt;To measure the performance degradation introduced by the new mechanism to filter our write operations on orphaned documents (i.e.&#160;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-59832&quot; title=&quot;Prevent writes to orphan documents&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-59832&quot;&gt;&lt;del&gt;SERVER-59832&lt;/del&gt;&lt;/a&gt;), it has been implemented a dedicated test starting from the existing &lt;a href=&quot;https://github.com/10gen/workloads/blob/master/workloads/crud.js&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;CRUD workloads&lt;/a&gt; test. Unfortunately, the existing test is too generic and then unsuitable for the purpose of this task where we would like to stress operations triggering the new filtering logic &lt;sup&gt;1&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The new test measures the executions time of the following use-cases using sharded collections of different cardinalities and sizes:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Update all documents using an empty query (i.e. {&lt;tt&gt;}&lt;/tt&gt;)&lt;/li&gt;
	&lt;li&gt;Update all documents using a non-empty query&lt;/li&gt;
	&lt;li&gt;Delete all documents using an empty query (i.e. {&lt;tt&gt;}&lt;/tt&gt;)&lt;/li&gt;
	&lt;li&gt;Delete all documents using a non-empty query&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Update and delete operations have been re-executed using both empty and non-empty queries to evaluate the performance penalties in scenarios where we assumed that the evaluation of the query had a non negligible cost in the total execution time.&lt;/p&gt;

&lt;p&gt;These operations were executed 5 times on 5 different collections (of the same type) obtaining the average value and taking into account the standard deviation to discard possibly distorted samples (e.g. caused by system processes running on the dedicated test machine).&lt;/p&gt;

&lt;p&gt;The experiment was repeated using collections with 100, 1K, 10K, 100K and 1M documents, and with document sizes of 128B, 512B, 1KB, 2KB and 1MB.&lt;/p&gt;
&lt;h1&gt;&lt;a name=&quot;Results&quot;&gt;&lt;/a&gt;Results&lt;/h1&gt;

&lt;p&gt;The obtained results highlighted that the current filtering logic on orphaned documents introduces a penalty in the execution time of about 5-6% on for update operations and 7-8% for delta operations. This value does not change significantly on varying the number of documents in the collection and their size.&lt;/p&gt;
&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;&lt;b&gt;Use case&lt;/b&gt;&lt;/th&gt;
&lt;th class=&apos;confluenceTh&apos;&gt;&lt;b&gt;Performance degradation&lt;/b&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;Update with empty query&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+6.0%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;Update with non-empty query&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+5.2%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;Delete with empty query&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+7.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;Delete with non-empty query&lt;/td&gt;
&lt;td class=&apos;confluenceTd&apos;&gt;+6.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;Further experiments also showed that having a huge number of chunks (e.g. 100K) affects performance but, as the Sharding team is actively working to avoid this type of scenarios (i.e.&#160;PM-2321), the analysis did not focus on that.&lt;/p&gt;

&lt;p&gt;Detailed information on different test cases and results is available in &lt;a href=&quot;https://docs.google.com/spreadsheets/d/19r5jQYva7F_eVWAl41zlwTcLQ7aGWC7v24YbKfAAAbU/edit?usp=sharing&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;SERVER-59832 - Performance tests&lt;/a&gt;.&lt;/p&gt;
&lt;h1&gt;&lt;a name=&quot;Conclusion&quot;&gt;&lt;/a&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;The current implementation to filter out write operations on orphaned documents (i.e.&#160;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-59832&quot; title=&quot;Prevent writes to orphan documents&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-59832&quot;&gt;&lt;del&gt;SERVER-59832&lt;/del&gt;&lt;/a&gt;) introduces an 8% overhead on any update and delete operations (rounding up).&lt;/p&gt;

&lt;p&gt;In order to minimize the computational cost of this logic, several areas for improvement have been identified. Dedicated tasks will be created accordingly.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;hr /&gt;
&lt;p&gt;&lt;sup&gt;1&lt;/sup&gt; The CRUD workloads test uses the same collection to measure the operation throughput. It runs different types of operations (e.g. delete and insert) to preserve the status of the collection for subsequent test cases, leading the measurement of the execution time for each single type of operation cumbersome and imprecise for our purposes.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 29 Dec 2021 09:24:56 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 6 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-2423</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>sergi.mateo-bellido@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 6 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>antonio.fuschetto@mongodb.com</customfieldvalue>
            <customfieldvalue>sergi.mateo-bellido@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i07m1z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hzwfq7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="5426">Sharding EMEA 2021-11-01</customfieldvalue>
    <customfieldvalue id="5427">Sharding EMEA 2021-11-15</customfieldvalue>
    <customfieldvalue id="5428">Sharding EMEA 2021-11-29</customfieldvalue>
    <customfieldvalue id="5429">Sharding EMEA 2021-12-13</customfieldvalue>
    <customfieldvalue id="5430">Sharding EMEA 2021-12-27</customfieldvalue>
    <customfieldvalue id="5681">Sharding EMEA 2022-01-10</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0787b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>