<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:52:49 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-61587] Research potential improvements in document validation for storage</title>
                <link>https://jira.mongodb.org/browse/SERVER-61587</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;MongoDB uses &lt;a href=&quot;https://github.com/mongodb/mongo/blob/07f76750e87a85245474c67ae2e8f2811322d45e/src/mongo/db/update/delta_executor.cpp#L40&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;DeltaExecutor::applyUpdate&lt;/a&gt; to apply diff updates to the document. For example, secondaries in the replica set use it to apply oplog entries.&lt;/p&gt;

&lt;p&gt;One of the steps of &lt;tt&gt;DeltaExecutor::applyUpdate&lt;/tt&gt; execution is validation of the result. It happens in the call to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/07f76750e87a85245474c67ae2e8f2811322d45e/src/mongo/db/update/object_replace_executor.cpp#L111&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;storage_validation::scanDocument&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This validation consists of several parts:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/23cd897bc180c1fc9cccb46b9069c9f57f3df58c/src/mongo/db/update/storage_validation.cpp#L189-L192&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Validation of maximum document depth&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/23cd897bc180c1fc9cccb46b9069c9f57f3df58c/src/mongo/db/update/storage_validation.cpp#L115-L125&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Ensuring that the document does not contain $-prefixed fields except some special cases&lt;/a&gt;&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/23cd897bc180c1fc9cccb46b9069c9f57f3df58c/src/mongo/db/update/storage_validation.cpp#L211-L216&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Setting the flag if the result document contains fields with dots and dollars&lt;/a&gt;. While not exactly a &quot;validation&quot; step, it still performed here to avoid traversing the document second time.&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/23cd897bc180c1fc9cccb46b9069c9f57f3df58c/src/mongo/db/update/storage_validation.cpp#L78-L112&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Ensuring the structural integrity of a DBRef field&lt;/a&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;There are two flags controlling the validation stages:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/371a49093ce34c834c5d250470b0f9ab6bd58090/src/mongo/db/update/update_executor.h#L91&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;skipDotsDollarsCheck&lt;/a&gt;. If this flag is set to &lt;tt&gt;false&lt;/tt&gt; (note the negation in the flag&apos;s name), first 3 checks need to be performed.&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;https://github.com/10gen/mongo/blob/371a49093ce34c834c5d250470b0f9ab6bd58090/src/mongo/db/update/update_executor.h#L95&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;validateForStorage&lt;/a&gt;. If this flag is set to &lt;tt&gt;true&lt;/tt&gt;, all 4 checks need to be performed.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Since all validation checks are performed in one method &lt;tt&gt;storage_validation::scanDocument&lt;/tt&gt;, &lt;a href=&quot;https://github.com/10gen/mongo/blob/371a49093ce34c834c5d250470b0f9ab6bd58090/src/mongo/db/update/object_replace_executor.cpp#L108-L115&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;we call it if either of this flags is set&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Validation can be time-expensive operation for large documents. We have noticed significant improvements by optimizing and skipping some of the validation (see &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-60176&quot; title=&quot;Delta-updates should only validate the diff for storage&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-60176&quot;&gt;&lt;del&gt;SERVER-60176&lt;/del&gt;&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-60156&quot; title=&quot;Add a way to bypass storageValid() for time-series updates&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-60156&quot;&gt;&lt;del&gt;SERVER-60156&lt;/del&gt;&lt;/a&gt;). We have a theory that we can get additional performance improvements by experimenting in this direction.&lt;/p&gt;

&lt;p&gt;The main goal of this ticket is &lt;b&gt;to research if there are any workloads where further optimization of validation can benefit performance.&lt;/b&gt;&lt;/p&gt;

&lt;p&gt;If such workloads exist, we can try to split the validation in 2 parts: first 3 checks and the last check. There are several reasons we could want to do that:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;This will allow to skip check (4) entirely when the &lt;tt&gt;validateForStorage&lt;/tt&gt;&#160;flag is not set&lt;/li&gt;
	&lt;li&gt;Assuming that the pre-image document is valid, we can validate points (1), (2) and (3) by analyzing only the diff, which can be significantly smaller than the result document itself. Once again, this could speed up the case when the &lt;tt&gt;validateForStorage&lt;/tt&gt; is not set. Note that point (4) is contextually dependent and we cannot check it only with the diff, the pre-image document is also required.&lt;/li&gt;
&lt;/ol&gt;
</description>
                <environment></environment>
        <key id="1928374">SERVER-61587</key>
            <summary>Research potential improvements in document validation for storage</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="12300">Won&apos;t Do</resolution>
                                        <assignee username="backlog-query-execution">Backlog - Query Execution</assignee>
                                    <reporter username="nikita.lapkov@mongodb.com">Nikita Lapkov</reporter>
                        <labels>
                    </labels>
                <created>Thu, 18 Nov 2021 13:36:28 +0000</created>
                <updated>Tue, 25 Jul 2023 17:14:40 +0000</updated>
                            <resolved>Tue, 25 Jul 2023 17:13:30 +0000</resolved>
                                                                                        <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="5590478" author="JIRAUSER1257467" created="Tue, 25 Jul 2023 17:13:31 +0000"  >&lt;p&gt;During Quick Wins Triage the team decided to close this ticket as Won&apos;t Do&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="1881302">SERVER-60156</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="1881956">SERVER-60176</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25125"><![CDATA[Query Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 25 Jul 2023 17:13:31 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        28 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ana.meza@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            28 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>ana.meza@mongodb.com</customfieldvalue>
            <customfieldvalue>backlog-query-execution</customfieldvalue>
            <customfieldvalue>nikita.lapkov@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0av2n:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr2azb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0ah7z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>