<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:54:37 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-62257] The &apos;valid&apos; flag response for the validate cmd w/ repair can be inaccurate in some cases when a duplicate record is deleted</title>
                <link>https://jira.mongodb.org/browse/SERVER-62257</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;The validate command returns incorrect &apos;valid&apos;==true/false responses when a duplicate record is deleted. Deleting a duplicate record implicitly deletes index entries and causes validate state&apos;s index entry counts to become incorrect. The &apos;valid&apos; response can be set to false when &lt;a href=&quot;https://github.com/mongodb/mongo/blob/1ef91934ee3d8ae22b3b8b51d21a7182b83c13ac/src/mongo/db/catalog/index_consistency.cpp#L612-L686&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;checks are made that the number of index entries and the number of record entries make sense for a given index type&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;There are a few additional count tracking inaccuracies in the code that were fixed in the PR in the comments. But identifying which indexes are implicitly affected by a record deletion is a tricky problem and needs significant refactoring in code with significant techdebt. The PR fixes alone would just shift the cases around when &apos;valid&apos; is incorrectly true/false and do not fully address the bug.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;&lt;del&gt;When missing index entries are identified as duplicate documents in validation repair mode, the duplicate document is deleted from collection and moved to a local lost and found. deleteDocument will call&#160;_unindexKeys in index_catalog_impl to remove the record being deleted from the indexes it is in. When a duplicate document is missing from an index, we want to ensure that the matching index key of the original document is not unindexed. The index the duplicate document is missing from should be unchanged when the duplicate is deleted from collection. This part will be done in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-50081&quot; title=&quot;Support validation repair mode with duplicates on unique indexes&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-50081&quot;&gt;&lt;del&gt;SERVER-50081&lt;/del&gt;&lt;/a&gt;.&lt;/del&gt;&lt;/p&gt;

&lt;p&gt;&lt;del&gt;For validation repair mode, we want to know what indexes the record was removed from and update the respective index key num counts accordingly.&lt;/del&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1956183">SERVER-62257</key>
            <summary>The &apos;valid&apos; flag response for the validate cmd w/ repair can be inaccurate in some cases when a duplicate record is deleted</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="10038" iconUrl="https://jira.mongodb.org/images/icons/subtask.gif" description="">Backlog</status>
                    <statusCategory id="2" key="new" colorName="default"/>
                                    <resolution id="-1">Unresolved</resolution>
                                        <assignee username="backlog-server-execution">Backlog - Storage Execution Team</assignee>
                                    <reporter username="shinyee.tan@mongodb.com">Shin Yee Tan</reporter>
                        <labels>
                            <label>techdebt</label>
                    </labels>
                <created>Thu, 23 Dec 2021 22:28:39 +0000</created>
                <updated>Tue, 2 May 2023 15:32:05 +0000</updated>
                                                                                                <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="5385580" author="dianna.hohensee" created="Fri, 28 Apr 2023 17:49:11 +0000"  >&lt;p&gt;Flagging for rescheduling after spending a few days on it because addressing the problem would require significant refactoring in techdebt heavy code.&lt;/p&gt;</comment>
                            <comment id="5384511" author="dianna.hohensee" created="Fri, 28 Apr 2023 13:36:58 +0000"  >&lt;p&gt;I&apos;ve put up a PR with some counter fixes and comments thereof. However, they aren&apos;t sufficient to fully correct the validate command index entry counters that ultimately effect whether validate returns &apos;valid&apos;==false/true.&lt;/p&gt;

&lt;p&gt;At a high level, when validate repairs a missing index entry, inserting the index key can cause a duplicate key error if the record store has two documents that violate an unique index. Validate then deletes the record, implicitly deleting ALL the index entries associated in OTHER indexes. However, validate only tracks a change in number of index entries for that one index, validate does not update the counts for other affected indexes. Sometimes those other affected indexes also run logic for a missing index entry, and with the current code in master that implicitly fixes the index entry that got previously implicitly deleted and bumps the numKeys count up for that index (when it shouldn&apos;t, the numKeys should have been decremented then incremented, net zero). Sometimes the other affected indexes don&apos;t have a missing index entry associated with the record that was deleted, and the index numKeys remains +1 too high. This last one is the one I have not figured out how to fix easily.&#160;We either would need to track/group index entries present and missing for each RecordId that has a problem earlier in the process; or communicate back up the stack what index entries were deleted implicitly when the record was deleted &#8211; probably not this solution because we call unindex() on every index in the collection, leaving it to low level code to find out whether the index entry exists for a given recordId.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_consistency.cpp#L226&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;This is the repair code to add missingIndexEntries, across all indexes&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_repair.cpp#L136&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;First that function tries to insert the index keys&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_repair.cpp#L186&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Then if that encounters a duplicate key error, the collection record is deleted / moved to lost and found&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_repair.cpp#L200&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Lastly the index keys for that specific index is re-added after deleting the record implicitly deletes the all the index entries for the record&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;</comment>
                            <comment id="5379645" author="dianna.hohensee" created="Thu, 27 Apr 2023 01:05:20 +0000"  >&lt;p&gt;Note: I think &lt;a href=&quot;https://github.com/mongodb/mongo/blob/7ada69991950a663f6a286af037b6d2bc712c126/src/mongo/db/catalog/index_consistency.cpp#L624&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this is also broken&lt;/a&gt;. We call that function in a loop through all the indexes in sequence, to check whether the collection is valid or not after repair.&lt;/p&gt;

&lt;p&gt;By virtue of index iteration ordering probably being the same across missingIndexEntry corrections and validateIndexKeyCount calls, the numRecords (collection entries) gets adjusted globally across calls. If I fix that to be case by case &#8211; note IndexValidateResults is per index &#8211; then the error goes away that this ticket is trying to fix &#8211; valid is true. But at the same time the test still thinks that there are 2 index entries per numKeys, which is not accurate. They&apos;re just both equally inaccurate now.&lt;/p&gt;</comment>
                            <comment id="5379118" author="dianna.hohensee" created="Wed, 26 Apr 2023 20:59:16 +0000"  >&lt;p&gt;I think the best approach might be to do something unusual.&lt;/p&gt;

&lt;p&gt;I&apos;m think of decorating the opCtx with a small data structure that the validate command can fill with a list of index names, and then way down through the IndexCatalog::unindexRecord to each index::_unindex method, there can be a check whether the decoration pointer is set and update whether an entry was deleted.&lt;/p&gt;

&lt;p&gt;We can &lt;a href=&quot;https://github.com/mongodb/mongo/blob/518f3df1276fa9c396b1384554e69dd96e633b6c/src/mongo/db/catalog/collection_write_path.cpp#L765-L767&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;track numKeysDeleted in the IndexCatalog::unindexRecord interface&lt;/a&gt;, but that doesn&apos;t tell us which indexes had an entry deleted. Actually, it looks like &lt;a href=&quot;https://github.com/mongodb/mongo/blob/518f3df1276fa9c396b1384554e69dd96e633b6c/src/mongo/db/index/index_access_method.cpp#L1249-L1255&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;there&apos;s a numKeysDeleted all the way down into each index::unindex method&lt;/a&gt;, so maybe it could be collected higher up.&lt;/p&gt;</comment>
                            <comment id="5372347" author="dianna.hohensee" created="Mon, 24 Apr 2023 22:40:25 +0000"  >&lt;p&gt;Shin Yee did some code spelunking to help me sort out what bug is remaining in this ticket.&lt;/p&gt;

&lt;p&gt;The problem is that validate in repair mode &lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_consistency.cpp#L219-L225&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;calls a function to repair an index entry&lt;/a&gt; and then correctly increments validate&apos;s &lt;tt&gt;numKeys&lt;/tt&gt; count for that specific index. However, the validate code does not increment the &lt;tt&gt;numKeys&lt;/tt&gt; count for any other index. Following the code that repairs an index entry, the repair function may decide to &lt;a href=&quot;https://github.com/mongodb/mongo/blob/bf85402724a8648d4ba8f13f2f76f69c3bd5ce85/src/mongo/db/catalog/index_repair.cpp#L186&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;move the record to lost and found&lt;/a&gt;, which can ultimately delete the document, &lt;a href=&quot;https://github.com/mongodb/mongo/blob/af171a2a74353693ce811a55577c1aa86ae2cdb4/src/mongo/db/catalog/index_repair.cpp#L105-L113&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;implicitly deleting other index entries&lt;/a&gt;, causing validate&apos;s &lt;tt&gt;numKeys&lt;/tt&gt; counts for any other index on the collection to become incorrect.&lt;/p&gt;</comment>
                            <comment id="4268986" author="JIRAUSER1253424" created="Tue, 28 Dec 2021 19:35:28 +0000"  >&lt;p&gt;Currently, when unindexing a key from the id index, we will blindly remove a matching entry &lt;a href=&quot;https://github.com/10gen/mongo/blob/shinyee.tan/SERVER-50081/src/mongo/db/catalog/index_catalog_impl.cpp#L1627&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;without looking at the recordID&lt;/a&gt;. As a result, when we have two identical documents with the same &lt;tt&gt;_id&lt;/tt&gt; field but only one index entry in the id index, we will remove it during the validation repair.&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>6.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18555" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname># of Sprints</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25136"><![CDATA[Storage Execution]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 28 Dec 2021 19:35:28 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        40 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-3341</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>fausto.leyva@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            40 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-execution</customfieldvalue>
            <customfieldvalue>dianna.hohensee@mongodb.com</customfieldvalue>
            <customfieldvalue>shinyee.tan@mongodb.com</customfieldvalue>
            <customfieldvalue>yuhong.zhang@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0fjdb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|i17zf6:8i</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10557" key="com.pyxis.greenhopper.jira:gh-sprint">
                        <customfieldname>Sprint</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue id="7175">Execution Team 2023-05-01</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|i0f5in:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>