<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:00:10 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-42315] Don&apos;t copy data files from a running mongod after a test fails</title>
                <link>https://jira.mongodb.org/browse/SERVER-42315</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Our test infrastructure copies data files for archival while a process is still running. The reason being that shutting down mongod may modify those files, making debugging more challenging. &lt;/p&gt;

&lt;p&gt;If a checkpoint is active in WiredTiger, the data files will become completely inconsistent and unusable (e.g. copy a data file, then copy the WT metadata which can point to a new checkpoint absent in the already-copied data file). We should find a way to stop checkpoints, run &lt;tt&gt;fsyncLock&lt;/tt&gt; or just SIGKILL the process before copying data files. I think SIGKILL is the simplest approach and would guarantee no files are modified before archival.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://evergreen.mongodb.com/task/mongodb_mongo_v4.2_ubuntu1604_concurrency_simultaneous_replication_dacd09a03b87a0dda83a5aee398f00eb295159aa_19_07_03_23_16_47/0&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;Here&apos;s a task&lt;/a&gt; where the data files are corrupt on node1 because the files were copied during an active checkpoint.&lt;/p&gt;</description>
                <environment></environment>
        <key id="870194">SERVER-42315</key>
            <summary>Don&apos;t copy data files from a running mongod after a test fails</summary>
                <type id="4" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14710&amp;avatarType=issuetype">Improvement</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="backlog-server-stm">Backlog - Server Tooling and Methods (STM)</assignee>
                                    <reporter username="louis.williams@mongodb.com">Louis Williams</reporter>
                        <labels>
                            <label>tig-resmoke</label>
                    </labels>
                <created>Mon, 22 Jul 2019 17:48:30 +0000</created>
                <updated>Tue, 6 Dec 2022 02:53:25 +0000</updated>
                            <resolved>Thu, 9 Jan 2020 16:21:34 +0000</resolved>
                                                                    <component>Testing Infrastructure</component>
                                        <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="2356269" author="max.hirschhorn@10gen.com" created="Fri, 2 Aug 2019 16:35:32 +0000"  >&lt;blockquote&gt;
&lt;p&gt;Max Hirschhorn, could we prioritize this fix?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;What I&apos;ve understood from chatting in-person with &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=vesselina.ratcheva&quot; class=&quot;user-hover&quot; rel=&quot;vesselina.ratcheva&quot;&gt;vesselina.ratcheva&lt;/a&gt; and &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=louis.williams&quot; class=&quot;user-hover&quot; rel=&quot;louis.williams&quot;&gt;louis.williams&lt;/a&gt; is Server engineers want:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;To have the in-memory state from the mongod process immediately after the test failure. This is because data consistency bugs may result from an inconsistency between the in-memory state and the on-disk state. Today, we only take core dumps on test timeouts and not for data inconsistency issues.&lt;/li&gt;
	&lt;li&gt;To have the data files from the mongod process without it going through clean shutdown. Clean shutdown rewrites the data files in a way which may mask the original data inconsistency issue.
	&lt;ul&gt;
		&lt;li&gt;The way archival in resmoke.py handles this today is to attempt to copy the data files while the process is still running. We&apos;ve had a number of issues with this approach (esp. on Windows due to file sharing permissions) so the STM team is eager to do work in this area. The issue Louis is pointing out here is that even though the test has finished and no client will be performing writes, it is still possible for WiredTiger to take a new checkpoint while resmoke.py archival is copying the data directory. The data files gathered end up being unusable. We need to prevent new checkpoints from being taken while we&apos;re archival data files. Killing the mongod process is Louis&apos;s suggestion for how to achieve this.&lt;/li&gt;
	&lt;/ul&gt;
	&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=kelsey.schubert&quot; class=&quot;user-hover&quot; rel=&quot;kelsey.schubert&quot;&gt;kelsey.schubert&lt;/a&gt;, the fix is very likely an epic-worthy project - it needs a scope document.&lt;/p&gt;</comment>
                            <comment id="2356125" author="thomas.schubert" created="Fri, 2 Aug 2019 15:26:38 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt;, could we prioritize this fix?&lt;/p&gt;</comment>
                            <comment id="2337347" author="milkie" created="Mon, 22 Jul 2019 19:45:32 +0000"  >&lt;p&gt;Or alternatively, it could copy down onto the disk some in-memory data critical to debugging a task.  Writing a new checkpoint just means the history of what the last checkpoint was, however many seconds or minutes ago, will possibly be overwitten.&lt;br/&gt;
I guess I&apos;m unconvinced it would actually be detrimental to problem diagnosis.&lt;/p&gt;</comment>
                            <comment id="2337289" author="louis.williams" created="Mon, 22 Jul 2019 19:19:48 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=milkie&quot; class=&quot;user-hover&quot; rel=&quot;milkie&quot;&gt;milkie&lt;/a&gt; my understanding of why we don&apos;t do this today is that the act of writing a checkpoint could mask or overwrite data in files critical to debugging a task.&lt;/p&gt;</comment>
                            <comment id="2337220" author="milkie" created="Mon, 22 Jul 2019 18:49:40 +0000"  >&lt;p&gt;I&apos;m not sure we even need that new option &amp;#8211; why not just call fsync, wait for it to return, and then copy the files?&lt;/p&gt;</comment>
                            <comment id="2337164" author="louis.williams" created="Mon, 22 Jul 2019 18:17:32 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=max.hirschhorn&quot; class=&quot;user-hover&quot; rel=&quot;max.hirschhorn&quot;&gt;max.hirschhorn&lt;/a&gt; My proposal is only to make sure file archival results in readable and reliable data files, which are much more useful than data files that have been corrupted because they were copied during an active checkpoint. I  also don&apos;t understand how the current procedure makes any guarantee that writes are journaled or in the stable checkpoint. If we want to guarantee all writes are durable before doing SIGKILL, we could potentially add an option to fSync to only call waitUntilDurable (which will flush log files) and not force a checkpoint.&lt;/p&gt;</comment>
                            <comment id="2337106" author="max.hirschhorn@10gen.com" created="Mon, 22 Jul 2019 17:59:12 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=louis.williams&quot; class=&quot;user-hover&quot; rel=&quot;louis.williams&quot;&gt;louis.williams&lt;/a&gt;, I don&apos;t see how we could run SIGKILL when the test isn&apos;t guaranteed to have waited for a journal flush when doing its writes (i.e. we aren&apos;t necessarily doing &lt;tt&gt;j=true&lt;/tt&gt;). Isn&apos;t it then possible for the corrupted data to not appear in the stable checkpoint at all and also make the data files useless?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="908614">SERVER-43049</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                                        </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25142"><![CDATA[Server Tooling & Methods]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 22 Jul 2019 17:59:12 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        4 years, 27 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                            <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-1547</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            4 years, 27 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_16465" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Linked BF Score</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>0.0</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-stm</customfieldvalue>
            <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>kelsey.schubert@mongodb.com</customfieldvalue>
            <customfieldvalue>louis.williams@mongodb.com</customfieldvalue>
            <customfieldvalue>max.hirschhorn@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvgl4v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr5ytr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hvg7e7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>