<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 05:38:46 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-56250] High CPU usage on Mongo server, but Mongo seems to be idle</title>
                <link>https://jira.mongodb.org/browse/SERVER-56250</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We are running MongoDB in version 4.2.13. Replica set, primary and two replicas. Servers have 4 CPUs and 16gb of RAM (m5.xlarge on AWS) and are dedicated only to Mongo.&#160;&lt;br/&gt;
We are running Mongo with default config and transactionLifetimeLimitSeconds set to 900.&lt;/p&gt;


&lt;p&gt;During the load tests, we are regularly encountering situations where the primary gets stuck. We are processing messages from mq with 10 threads and with those threads we are inserting results into Mongo.&#160;&lt;br/&gt;
Load average becomes very high, around 9, and by watching mongotop and mongostat it seems Mongo isn&#8217;t performing any db operation at that time. iostat shows high values for user and idle params.&lt;/p&gt;


&lt;p&gt;We couldn&#8217;t find any hint even with the profiler turned on. We have necessary indexes, meaning we are not seeing&#160;COLLSCAN in currentOp, which also didn&apos;t reveal to us anything&#160;obviously abnormal.&lt;/p&gt;

&lt;p&gt;In attachment you can find:&lt;/p&gt;

&lt;p&gt;instance metrics, mongo metrics, mongostat and mongtop of problematic part (part where instance is under load but nothing shows under mongo),&lt;br/&gt;
serverStatus during the peak (high average load and high mongo load) and serverStatus after the peak (high average load and low mongo load). Stats for two of our main collections, jobHolder and jobRecord.&lt;/p&gt;

&lt;p&gt;If necessary we can provide more info or perform additional tests and post results.&lt;/p&gt;

&lt;p&gt;We are grateful for any advice on how to overcome this issue.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="1686015">SERVER-56250</key>
            <summary>High CPU usage on Mongo server, but Mongo seems to be idle</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="13204">Community Answered</resolution>
                                        <assignee username="dmitry.agranat@mongodb.com">Dmitry Agranat</assignee>
                                    <reporter username="sasa.s.trifunovic@gmail.com">Sasa Trifunovic</reporter>
                        <labels>
                            <label>Performance</label>
                    </labels>
                <created>Wed, 21 Apr 2021 19:50:46 +0000</created>
                <updated>Fri, 27 Oct 2023 15:56:28 +0000</updated>
                            <resolved>Sun, 16 May 2021 10:02:31 +0000</resolved>
                                    <version>4.2.13</version>
                                                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="3756073" author="dmitry.agranat" created="Wed, 5 May 2021 11:30:54 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sasa.s.trifunovic%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sasa.s.trifunovic@gmail.com&quot;&gt;sasa.s.trifunovic@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Is it due to our test constantly hitting that eviction_dirty_trigger as soon as Mongo eviction threads clean up some of the dirty pages? Like some sort of seesaw stalemate?&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;This is precisely the case with this workload&lt;/p&gt;</comment>
                            <comment id="3755967" author="JIRAUSER1259481" created="Wed, 5 May 2021 10:10:15 +0000"  >&lt;p&gt;Hi Dmitry,&#160;&lt;/p&gt;

&lt;p&gt;Although I understand that hitting configured eviction_dirty_trigger (default 20%) with dirty pages will result in Mongo trying to evict them and with that in a much higher average load, I don&apos;t understand why this average load would never go down unless we would abort the load test?&#160;&lt;br/&gt;
Is it due to our test constantly hitting that eviction_dirty_trigger as soon as Mongo eviction threads clean up some of the dirty pages? Like some sort of seesaw stalemate?&lt;/p&gt;</comment>
                            <comment id="3751021" author="dmitry.agranat" created="Sun, 2 May 2021 12:42:50 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sasa.s.trifunovic%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sasa.s.trifunovic@gmail.com&quot;&gt;sasa.s.trifunovic@gmail.com&lt;/a&gt;, thank you for uploading the requested information. &lt;/p&gt;

&lt;p&gt;What causing the rise in CPU utilization is eviction trying to clean up dirty pages. This is expected.&lt;/p&gt;

&lt;p&gt;Looking at the workload, you are basically overwhelming the cache by trashing the bytes in and out of the cache. For example, given the dirty cache portion being configured at ~3GB (on your 15 GB RAM server) when we decide to start evicting data, and the fact that we need to evict &lt;b&gt;all of this data&lt;/b&gt; during this workload every minute, this is where the eviction process kicks in resulting in the elevated CPU.&lt;/p&gt;

&lt;p&gt;I am not sure how much data you need to insert/update per second during this test and what is the average page size but the in general, the way to avoid this situation is either to throttle the workload, scale horizontally (sharding) or add more RAM (though I am not sure how much but given you need to flush ~3 GB during the high CPU event, I would start testing with at least 30 GB and move forward based on the results).&lt;/p&gt;

&lt;p&gt;Dima&lt;/p&gt;</comment>
                            <comment id="3746755" author="JIRAUSER1259481" created="Thu, 29 Apr 2021 15:22:02 +0000"  >&lt;p&gt;Looking at mongostat during our load tests it becomes apparent that mongod starts stalling when it hits 20% of dirty pages. Looking at mongo doc it seems like something expected but what seems to be very strange is that the percentage of dirty pages, shown in mongostat, afterwards never goes down, like eviction never occurs.&lt;/p&gt;</comment>
                            <comment id="3745025" author="JIRAUSER1259481" created="Wed, 28 Apr 2021 19:17:31 +0000"  >&lt;p&gt;Hi Dmitry,&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Previously today we also tried unordered bulk write (we were using the default, ordered) which didn&apos;t help, and just now, in an effort to eliminate replication as cause for this issue, we tried lowering write concern from majority to primary only, and that yielded exactly the same results, meaning Mongo got stuck.&lt;/p&gt;</comment>
                            <comment id="3743232" author="JIRAUSER1259481" created="Wed, 28 Apr 2021 10:11:49 +0000"  >&lt;p&gt;Hi Dmitry,&lt;/p&gt;



&lt;p&gt;Sorry for the incomplete logs. I just uploaded two files, diagnostics-April.tar.gz and mongod-logs-end-of-April.tar.gz. The first one contains diagnostics for the whole of April while the second one contains updated mongod logs for period between 20th and 27th of April. Both files contain logs from all members of the replica set.&lt;/p&gt;

&lt;p&gt;Btw, we performed two additional tests, with various levels of profiling level (1 and 2), in hope that they may shed more light on this issue.&lt;br/&gt;
The first additional test with profiling set to 1 ran between&#160;2021-04-22T13:16 and&#160;2021-04-22T13:35 (UTC) and the second with profiling level set to 2 ran between&#160;2021-04-22T14:21:36 and&#160;2021-04-22T14:34. Both tests ended up with Mongo getting stuck.&lt;/p&gt;

&lt;p&gt;On the next day (2021-04-23) we bumped or primary mongo instance to&#160;m5.2xlarge with gp2 disks. Initially, our load test went ok but on the second go Mongo got stuck again. Failed test started around&#160;15:05 UTC and got aborted around&#160;15:28.&lt;/p&gt;

&lt;p&gt;Thanks a lot for looking into this.&lt;br/&gt;
&#160;&lt;/p&gt;</comment>
                            <comment id="3737822" author="dmitry.agranat" created="Mon, 26 Apr 2021 07:28:09 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sasa.s.trifunovic%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sasa.s.trifunovic@gmail.com&quot;&gt;sasa.s.trifunovic@gmail.com&lt;/a&gt;, the uploaded &lt;tt&gt;diagnostic.data&lt;/tt&gt; only covers April 11-13 while the reported test was performed on April 21st. If you still have the data, could you upload it after checking that it covers the time of the reported test? In addition, it would be helpful to have the &lt;tt&gt;diagnostic.data&lt;/tt&gt; from all 3 nodes in this replica set.&lt;/p&gt;</comment>
                            <comment id="3732633" author="JIRAUSER1259481" created="Thu, 22 Apr 2021 11:41:08 +0000"  >&lt;p&gt;Hi,&lt;/p&gt;

&lt;p&gt;I just uploaded logs(logs from the primary and its replicas spanning a couple of days)&#160; and the complete diagnostics folder from the primary.&#160;&lt;br/&gt;
Test started around Wed Apr 21 15:13:00 UTC 2021, but it is hard to determine its ending because this time we let it go without aborting it but I would say it lasted until Wed Apr 21 16:20:00 UTC 2021&lt;br/&gt;
For this load test, we have disabled profiling to make sure no overhead was introduced.&#160;&lt;br/&gt;
If you want we can repeat the test with profiling turned on to whichever level you find suitable.&lt;br/&gt;
&#160;&lt;/p&gt;</comment>
                            <comment id="3731455" author="dmitry.agranat" created="Wed, 21 Apr 2021 20:17:37 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=sasa.s.trifunovic%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;sasa.s.trifunovic@gmail.com&quot;&gt;sasa.s.trifunovic@gmail.com&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would you please archive (tar or zip) the mongod.log files from all members covering the incident and the &lt;tt&gt;$dbpath/diagnostic.data&lt;/tt&gt; directory (the contents are described &lt;a href=&quot;https://docs.mongodb.com/manual/administration/analyzing-mongodb-performance/#full-time-diagnostic-data-capture&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;here&lt;/a&gt;) and upload them to this &lt;a href=&quot;https://10gen-httpsupload.s3.amazonaws.com/upload_forms/3f73389f-eff1-416b-80c6-0fdd59714771.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;support uploader&lt;/a&gt; location?&lt;/p&gt;

&lt;p&gt;Files uploaded to this portal are visible only to MongoDB employees and are routinely deleted after some time.&lt;/p&gt;

&lt;p&gt;Please also post the exact start and end timestamps and timezone for the load test. In addition, please note the time when the Database Profiler was turned on.&lt;/p&gt;

&lt;p&gt;Dima&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="310942" name="15-16-21-04-2021_serverStatus_high_average_load_high_mongo_load.pdf" size="102789" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 19:17:42 +0000"/>
                            <attachment id="310941" name="15-20-21-04-2021_serverStatus_high_average_load_low_mongo_load.pdf" size="103877" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 19:17:42 +0000"/>
                            <attachment id="310945" name="instance_metrics_1.png" size="657825" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:55 +0000"/>
                            <attachment id="310946" name="instance_metrics_2.png" size="515423" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:51 +0000"/>
                            <attachment id="310940" name="jobHolders" size="10317" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 19:29:34 +0000"/>
                            <attachment id="310939" name="jobRecord" size="10433" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 19:29:34 +0000"/>
                            <attachment id="310944" name="mongo_exporter_1.png" size="539402" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:54 +0000"/>
                            <attachment id="310947" name="mongo_exporter_2.png" size="646382" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:50 +0000"/>
                            <attachment id="310943" name="mongostat.png" size="1625245" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:51 +0000"/>
                            <attachment id="310948" name="mongotop.png" size="564549" author="sasa.s.trifunovic@gmail.com" created="Wed, 21 Apr 2021 18:37:50 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 21 Apr 2021 20:17:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        2 years, 40 weeks ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>luke.bonanomi@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            2 years, 40 weeks ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>dmitry.agranat@mongodb.com</customfieldvalue>
            <customfieldvalue>sasa.s.trifunovic@gmail.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz5rfb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hyqpr3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[dmitry.agranat@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hz5dof:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>