<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:35:30 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-34082] MongoDB stalls randomly</title>
                <link>https://jira.mongodb.org/browse/SERVER-34082</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We are experiencing random MongoDB freezes on our 2 replicasets. During short time intervals (about 30-60 seconds) we are unable to get response from DB. That happens randomly. No CPU, Disk, Network or memory spikes are observed in our Nagios monitoring.&lt;br/&gt;
MongoDB logs show that db accepts connections and after 30 seconds client disconnects.&lt;br/&gt;
From the application side a bunch of MongoCursorTimeoutException&apos;s were thrown from a PHP driver at March 21, 2018, 3:01:42 a.m. UTC.&lt;br/&gt;
I&apos;ve attached diagnostic.data logs for the problematic period of time from our 3 db instances.&lt;br/&gt;
DB1 - master, DB2/DB3 - slaves. The problem started on DB2 at March 21, 2018, 3:01:12 and 30 seconds later we&apos;ve observed exceptions on our PHP application servers.&lt;br/&gt;
Digging into diagnostic logs I&apos;ve found that the problematic period of time is empty there.&lt;/p&gt;

&lt;p&gt;Is this problem resolved in a latter MongoDB releases? We are planing an update and want to make sure that the fix is already included.&lt;/p&gt;</description>
                <environment></environment>
        <key id="515584">SERVER-34082</key>
            <summary>MongoDB stalls randomly</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="kelsey.schubert@mongodb.com">Kelsey Schubert</assignee>
                                    <reporter username="neanton">Anton Neznaenko</reporter>
                        <labels>
                            <label>SEW</label>
                    </labels>
                <created>Fri, 23 Mar 2018 10:52:41 +0000</created>
                <updated>Mon, 23 Jul 2018 09:29:12 +0000</updated>
                            <resolved>Thu, 21 Jun 2018 00:49:20 +0000</resolved>
                                    <version>3.4.6</version>
                                                    <component>WiredTiger</component>
                                        <votes>0</votes>
                                    <watches>12</watches>
                                                                                                                <comments>
                            <comment id="1926965" author="thomas.schubert" created="Thu, 21 Jun 2018 00:49:20 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=neanton&quot; class=&quot;user-hover&quot; rel=&quot;neanton&quot;&gt;neanton&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;We haven&#8217;t heard back from you for some time, so I&#8217;m going to mark this ticket as resolved. Unfortunately, we do not have enough information to continue to investigate this issue. If this is still an problem for you after upgrading to MongoDB 3.6, please provide additional information and we will reopen the ticket.&lt;/p&gt;

&lt;p&gt;Kind regards,&lt;br/&gt;
Kelsey&lt;/p&gt;</comment>
                            <comment id="1881141" author="thomas.schubert" created="Wed, 2 May 2018 22:09:41 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=neanton&quot; class=&quot;user-hover&quot; rel=&quot;neanton&quot;&gt;neanton&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Unfortunately, since these fixes are speculative in nature, they do not justify backport into older stable branches.&lt;/p&gt;

&lt;p&gt;Kind regards,&lt;br/&gt;
Kelsey&lt;/p&gt;</comment>
                            <comment id="1865417" author="neanton" created="Mon, 16 Apr 2018 17:39:06 +0000"  >&lt;p&gt;Hi, Kelsey,&lt;/p&gt;

&lt;p&gt;Is there any chance this speculative fix is backported to 3.4 branch? We would be able to upgrade our cluster then and check if it&apos;s resolved.&lt;br/&gt;
BTW, we&apos;ve seen same strange behavior at around 1:41 UTC on 14th of April. Attaching diagnostic data logs, if that would give you some additional info. &lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/184207/184207_ncs-mongo-hung-14.04.2018.tar.gz&quot; title=&quot;ncs-mongo-hung-14.04.2018.tar.gz attached to SERVER-34082&quot;&gt;ncs-mongo-hung-14.04.2018.tar.gz&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="1860649" author="thomas.schubert" created="Tue, 10 Apr 2018 19:01:55 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=neanton&quot; class=&quot;user-hover&quot; rel=&quot;neanton&quot;&gt;neanton&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;After investigating this issue, we&apos;ve made a few speculative fixes which will be included in MongoDB 3.6.4. Unfortunately, we haven&apos;t been able to reproduce the issue that you&apos;re observing and and so we aren&apos;t able to confirm whether these fixes resolve the issue. Would you be able to upgrade to MongoDB 3.6.4 when it is available, and let us know if it resolves the problem?&lt;/p&gt;

&lt;p&gt;Thank you,&lt;br/&gt;
Kelsey&lt;/p&gt;</comment>
                            <comment id="1858621" author="alexander.gorrod" created="Sun, 8 Apr 2018 22:32:49 +0000"  >&lt;p&gt;Thanks for the investigation &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=keith.bostic&quot; class=&quot;user-hover&quot; rel=&quot;keith.bostic&quot;&gt;keith.bostic&lt;/a&gt;, I can answer your question about eviction worker threads:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;One note is that during all of this, the number of eviction worker threads goes to 0 and has to ramp up again, which doesn&apos;t seem right. I don&apos;t see anything in the code that makes me think that should happen, so I&apos;m possibly misreading that.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;The statistic there is misleading - dynamic eviction worker threads are disabled in the version of MongoDB this user is running, so there will be a constant number of workers (4).&lt;/p&gt;</comment>
                            <comment id="1858240" author="keith.bostic" created="Fri, 6 Apr 2018 21:49:39 +0000"  >&lt;p&gt;I spent some more time looking at this today. As &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=bruce.lucas&quot; class=&quot;user-hover&quot; rel=&quot;bruce.lucas&quot;&gt;bruce.lucas&lt;/a&gt; noted, there&apos;s a gap right when the problem happens, so all we can really look at is the stuff around the edges.&lt;/p&gt;

&lt;p&gt; &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;a id=&quot;183621_thumb&quot; href=&quot;https://jira.mongodb.org/secure/attachment/183621/183621_keith.png&quot; title=&quot;keith.png&quot; file-preview-type=&quot;image&quot; file-preview-id=&quot;183621&quot; file-preview-title=&quot;keith.png&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/thumbnail/183621/_thumb_183621.png&quot; style=&quot;border: 0px solid black&quot; role=&quot;presentation&quot;/&gt;&lt;/a&gt;&lt;/span&gt; &lt;/p&gt;

&lt;p&gt;I think it&apos;s interesting that DB2 has a big spike in memory allocation/free calls and the disk goes to 100%, but everything else pretty much shuts down. Eviction/reconciliation and logging all go to zero, and eviction/reconciliation doesn&apos;t recover for a long time, even after ftdc data is coming in again. There&apos;s a checkpoint that starts inside the gap, but whatever happened started before the checkpoint started: the checkpoint might not be helping, but it didn&apos;t cause the problem. The oplog is quiescent.&lt;/p&gt;

&lt;p&gt;One note is that during all of this, the number of eviction worker threads goes to 0 and has to ramp up again, which doesn&apos;t seem right. I don&apos;t see anything in the code that makes me think that should happen, so I&apos;m possibly misreading that.&lt;/p&gt;

&lt;p&gt;Looking at the DB1 data (which didn&apos;t suffer the ftdc outage), it&apos;s pretty clear the system went from 0 to 100 in a few seconds, that is, everything was quiescent at the start, and then everything was slammed without any ramp-up, so I think it&apos;s a fair guess the system was in a state where it couldn&apos;t quickly adapt to a surge in traffic.&lt;/p&gt;

&lt;p&gt;I suspect the explanation for the surge in cursor restarts is that threads are repeatedly hitting pages that have split (why those pages are splitting right at this time, I have no idea). The design assumption was a cursor restart does enough work (searching the tree, for example), that there wasn&apos;t any reason to yield the CPU. This is also on a 16-core system, which should give us some buffering from too many threads chasing too few cores. That said, I suspect the large number of cursor restarts is because some relatively few cursors were able to build up impressive restart counts because they were hitting the same page repeatedly.&lt;/p&gt;</comment>
                            <comment id="1845057" author="keith.bostic" created="Mon, 26 Mar 2018 15:06:30 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=alexander.gorrod%40mongodb.com&quot; class=&quot;user-hover&quot; rel=&quot;alexander.gorrod@mongodb.com&quot;&gt;alexander.gorrod@mongodb.com&lt;/a&gt;, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=michael.cahill&quot; class=&quot;user-hover&quot; rel=&quot;michael.cahill&quot;&gt;michael.cahill&lt;/a&gt;, when &lt;tt&gt;cursor_restart&lt;/tt&gt; is incremented, the restart is going to cause a search of the tree from the root, and I don&apos;t think that&apos;s likely to complete without yielding, which makes me think the cursor restart spike is a side-effect of something else.&lt;/p&gt;</comment>
                            <comment id="1843165" author="neanton" created="Fri, 23 Mar 2018 13:58:18 +0000"  >&lt;p&gt;Yep, Bruce, our app does a series of read and write operations during single HTTP request, some of those operations are with secondaryPreferred read preference to distribute reads to secondaries. So I would say that if during read request to db2 something hung, there would be no other operations and current HTTP request will finish with error. We are using default driver timeout of 30 seconds, so ALL our HTTP requests hang during this period of time.&lt;br/&gt;
Please let us know if you need any other detailed information about our setup or some additional debugging info.&lt;/p&gt;</comment>
                            <comment id="1843145" author="bruce.lucas@10gen.com" created="Fri, 23 Mar 2018 13:46:54 +0000"  >&lt;p&gt;Hi Anton,&lt;/p&gt;

&lt;p&gt;Thanks for the detailed problem report. From the metrics you uploaded, it appears to me that the problem occurred on the db2 secondary node, stalling operations there. The other nodes appeared to be unaffected, except that they weren&apos;t processing any requests during the incident. I imagine this could occur if your application depends on secondary reads from db2, so that if it stalls, all application threads stall. Is this plausible, given the way your application works?&lt;/p&gt;

&lt;p&gt;Meanwhile we have not yet identified a known issue that matches the symptoms, but we are still looking into it.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Bruce&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="516663">WT-3997</issuekey>
        </issuelink>
                            </outwardlinks>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="523433">WT-4027</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="183621" name="keith.png" size="433414" author="keith.bostic@mongodb.com" created="Fri, 6 Apr 2018 21:23:12 +0000"/>
                            <attachment id="182495" name="mongo_diag.zip" size="31406520" author="neanton" created="Fri, 23 Mar 2018 10:30:20 +0000"/>
                            <attachment id="184207" name="ncs-mongo-hung-14.04.2018.tar.gz" size="41836876" author="neanton" created="Mon, 16 Apr 2018 17:39:09 +0000"/>
                            <attachment id="182528" name="stall.png" size="299680" author="bruce.lucas@mongodb.com" created="Fri, 23 Mar 2018 13:58:33 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 23 Mar 2018 13:46:54 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 34 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>backlog-server-pm</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 34 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>alexander.gorrod@mongodb.com</customfieldvalue>
            <customfieldvalue>neanton</customfieldvalue>
            <customfieldvalue>bruce.lucas@mongodb.com</customfieldvalue>
            <customfieldvalue>keith.bostic@mongodb.com</customfieldvalue>
            <customfieldvalue>kelsey.schubert@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|httdqv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|htkoqv:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[bruce.lucas@mongodb.com]]></customfieldvalue>
        <customfieldvalue><![CDATA[kelsey.schubert@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htszyn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>