<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:39:32 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-35339] Complete recovery failure after unclean shutdown</title>
                <link>https://jira.mongodb.org/browse/SERVER-35339</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;h1&gt;&lt;a name=&quot;Environment%3A&quot;&gt;&lt;/a&gt;&lt;b&gt;Environment:&lt;/b&gt;&lt;/h1&gt;
&lt;ul&gt;
	&lt;li&gt;3 server replica sets on AWS (t2.medium) running 3.6.5.&#160;&lt;/li&gt;
	&lt;li&gt;All writes employ MAJORITY write concern.&lt;/li&gt;
	&lt;li&gt;Default journaling is enabled.&lt;/li&gt;
&lt;/ul&gt;


&lt;h1&gt;&lt;a name=&quot;Expectedbehaviour%3A&quot;&gt;&lt;/a&gt;&lt;b&gt;Expected behaviour:&lt;/b&gt;&lt;/h1&gt;

&lt;p&gt;That the supported recovery methods return the instance to health.&lt;/p&gt;
&lt;h1&gt;&lt;a name=&quot;ObservedBehaviour%3A&quot;&gt;&lt;/a&gt;&lt;b&gt;Observed Behaviour:&lt;/b&gt;&lt;/h1&gt;

&lt;p&gt;After an unclean shutdown a secondary never recovers on it&apos;s own, never making it past the final step in the following sample log, see log1.txt.&lt;/p&gt;

&lt;p&gt;Clearly&#160;the instance had run out of disk space at this point (100GB provisioned for a database normally 1.6GB). Here are the contents of the /var/mongodata folder, see log2.txt.&lt;br/&gt;
&#160;&lt;br/&gt;
So it appears the culprit is entirely the &lt;b&gt;WiredTigerLAS.wt&lt;/b&gt; file.&lt;/p&gt;

&lt;p&gt;For additional information the df output at this point:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;Filesystem      Size  Used Avail Use% Mounted on&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;devtmpfs        7.9G   60K  7.9G   1% /dev&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;tmpfs           7.9G     0  7.9G   0% /dev/shm&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;/dev/xvda1       20G  2.8G   17G  14% /&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;/dev/xvdi       100G  100G  140K 100% /var/mongodata&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;The only option to recover this instance is to do a full resync (after deleting the contents of /var/mongodata), see log3.txt.&lt;br/&gt;
&#160;&lt;br/&gt;
The initial sync currently takes less than 60 seconds but this will obviously not be suitable once the size of the data set grows.&lt;/p&gt;</description>
                <environment></environment>
        <key id="552745">SERVER-35339</key>
            <summary>Complete recovery failure after unclean shutdown</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="bruce.lucas@mongodb.com">Bruce Lucas</assignee>
                                    <reporter username="MarcF">Marc Fletcher</reporter>
                        <labels>
                            <label>SWKB</label>
                    </labels>
                <created>Fri, 1 Jun 2018 14:29:27 +0000</created>
                <updated>Thu, 27 Dec 2018 05:08:19 +0000</updated>
                            <resolved>Sat, 17 Nov 2018 04:08:30 +0000</resolved>
                                    <version>3.6.5</version>
                                                    <component>WiredTiger</component>
                                        <votes>0</votes>
                                    <watches>9</watches>
                                                                                                                <comments>
                            <comment id="1924557" author="marcf" created="Tue, 19 Jun 2018 09:16:29 +0000"  >&lt;p&gt;I&apos;ve spent quite a bit of time looking at the application triggers for these issues. Currently we have a very large mono mongo sharded cluster for everything (v3.0.12) and are in the process of transferring independent loads to independent mongo replica sets. One of the additional motivations for doing this transfer is to be able to better understand the load profiles of each service and optimise accordingly (if this can also help here then it&apos;s a win/win).&#160;&lt;/p&gt;

&lt;p&gt;As I mentioned in an earlier comment we saw corresponding increases in DB load in the mono cluster at the same moment this new replica set was hitting 100% CPU usage and then crashing. We indeed tracked this load increase back to a process that was occasionally creating large documents of time series data (single array containing 100K+ entries) and then in quick succession $push(ing) additional elements onto that array. By reducing the allowed time range for each bucket in the time series, and thereby considerably reducing the maximum size of this array, these load issues appear to be completely resolved. We&apos;ve had everything running for a few days now in this configuration and the CPU/memory loads are now very flat. We have also manually triggered a few ungraceful shutdowns by hard killing instances in the replica set and successful recovery has been achieved each time.&lt;/p&gt;</comment>
                            <comment id="1921144" author="bruce.lucas@10gen.com" created="Thu, 14 Jun 2018 20:22:59 +0000"  >&lt;p&gt;Hi Marc,&lt;/p&gt;

&lt;p&gt;We&apos;ve taken another look at the data and it appears that you also hit &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34938&quot; title=&quot;Secondary slowdown or hang due to content pinned in cache by single oplog batch&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34938&quot;&gt;&lt;del&gt;SERVER-34938&lt;/del&gt;&lt;/a&gt;. Specifically&lt;/p&gt;

&lt;ul&gt;
	&lt;li&gt;After the server restart at 06-05 16:17, the node transitioned to secondary and attempted to catch up from the considerable lag that had built up, and during this process you hit &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34938&quot; title=&quot;Secondary slowdown or hang due to content pinned in cache by single oplog batch&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34938&quot;&gt;&lt;del&gt;SERVER-34938&lt;/del&gt;&lt;/a&gt;. This eventually resulted in a crash due to running out of memory.&lt;/li&gt;
&lt;/ul&gt;


&lt;ul&gt;
	&lt;li&gt;After the subsequent server restart at 06-0 16:47 the server began recovery oplog application to re-estabilish a locally consistent state of the db (which is distinct from using the oplog to catch up a lagged secondary, and you encountered &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;I&apos;ve adjust the linked cases to reflect that you hit both issues.&lt;/p&gt;

&lt;p&gt;Regarding the first issue, we are interested in learning more about the application triggers for this. One trigger that we know about is an application that applies a lot of small updates to large documnts; does that match your application?&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Bruce&lt;/p&gt;</comment>
                            <comment id="1921054" author="dmitry.agranat" created="Thu, 14 Jun 2018 19:30:15 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=MarcF&quot; class=&quot;user-hover&quot; rel=&quot;MarcF&quot;&gt;MarcF&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;I am going to resolve this ticket as a duplicate of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt;, you can watch &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt; for updates.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dima&lt;/p&gt;
</comment>
                            <comment id="1920990" author="marcf" created="Thu, 14 Jun 2018 18:37:21 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt;&#160;sounds like it might be related. The problem appears to be fairly critical in my case though as it&apos;s not just &apos;running out of disk space&apos;. If the entire database is 1GB (and resync able in less than 60 seconds) and after 100GB it&apos;s still not managed to restore itself&#160; this sounds like an infinite loop that will never be resolved regardless of disk space.&lt;/p&gt;</comment>
                            <comment id="1916360" author="dmitry.agranat" created="Sun, 10 Jun 2018 10:37:42 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=MarcF&quot; class=&quot;user-hover&quot; rel=&quot;MarcF&quot;&gt;MarcF&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thanks for clarifying that the increased CPU load on 1 CPU Instance was a result of the increased workload concurrency and this is expected.&lt;/p&gt;

&lt;p&gt;In regards to issue #2:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;The inability for mongo to recover after the failover.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;As you&apos;ve mentioned during your initial comment, the Instance ran out of disk space. We can see this in the mongod log:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;2018-05-30T19:39:55.036+0000 E STORAGE  [thread1] WiredTiger error (28) [1527709195:36383][4246:0x7f5fe8569700], eviction-server: cache eviction thread error: No space left on device&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;
&lt;p&gt;Once the Instance ran out of disk space, you deleted the contents of the &lt;tt&gt;dbpath&lt;/tt&gt; and performed the initial sync which was successful. This is expected.&lt;/p&gt;

&lt;p&gt;However, the reason you ran out of disk space is most likely &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt; where a secondary gets stuck with cache full. One of the effects of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt; is lots of lookaside table activity and the growth of the &lt;tt&gt;WiredTigerLAS.wt&lt;/tt&gt; file.&lt;/p&gt;

&lt;p&gt;As you&apos;ve mentioned:&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;and because of the increased primary instance size it didn&apos;t crash&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;Increasing the memory size helps in this situation but a fix will be a part of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-34941&quot; title=&quot;Add testing to cover cases where timestamps cause cache pressure&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-34941&quot;&gt;&lt;del&gt;SERVER-34941&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dima&lt;/p&gt;</comment>
                            <comment id="1915344" author="marcf" created="Fri, 8 Jun 2018 16:18:49 +0000"  >&lt;p&gt;The CPU credits are never depleted so I would never expect any throttling on these instances (they only get reset when we change instance type):&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/188993/188993_image-2018-06-08-17-14-04-212.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Having done some further investigation I have found some evidence to suggest the burst in load is due to an increase in the number of DB writes being made (apologies for the misleading comment that &quot;load on this cluster should typically remain constant&quot;). Referring to the instance observed June 7th 07:30 UTC there is a commensurate increase in the number &apos;active writes&apos;:&lt;/p&gt;
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;insert query update delete getmore command dirty  used flushes vsize   res qrw  arw net_in net_out conn set repl                time&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     8|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   783b   71.1k  116 rs0  PRI Jun  7 07:30:56.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     8|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   726b   70.3k  116 rs0  PRI Jun  7 07:30:57.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     4|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   547b   67.1k  116 rs0  PRI Jun  7 07:30:58.488&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0    14|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0  1.34k   72.2k  116 rs0  PRI Jun  7 07:30:59.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       1    11|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0  2.68k   72.4k  116 rs0  PRI Jun  7 07:31:00.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     3|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   493b   66.9k  116 rs0  PRI Jun  7 07:31:01.488&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     6|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   610b   68.7k  116 rs0  PRI Jun  7 07:31:02.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     8|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   726b   70.2k  116 rs0  PRI Jun  7 07:31:03.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     5|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   609b   68.6k  116 rs0  PRI Jun  7 07:31:04.488&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       1     7|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0  1.75k   69.6k  116 rs0  PRI Jun  7 07:31:05.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;insert query update delete getmore command dirty  used flushes vsize   res qrw  arw net_in net_out conn set repl                time&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     8|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   726b   70.3k  116 rs0  PRI Jun  7 07:31:06.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       1    15|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0  4.97k   72.7k  116 rs0  PRI Jun  7 07:31:07.496&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0    10|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0  1.12k   70.1k  116 rs0  PRI Jun  7 07:31:08.488&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0    10|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   842b   71.9k  116 rs0  PRI Jun  7 07:31:09.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     *0     *0       0     8|0  0.0% 38.3%       0 3.36G 1.53G 0|0  1|0   726b   70.2k  116 rs0  PRI Jun  7 07:31:10.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     72     *0      11    46|0  0.7% 38.1%       0 3.80G 2.05G 0|0 2|42  47.8k    106k  121 rs0  PRI Jun  7 07:31:11.623&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     25     *0      39    83|0  1.2% 38.2%       0 3.88G 2.14G 0|0 1|41  89.5k    154k  121 rs0  PRI Jun  7 07:31:12.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     21     *0      29    75|0  1.3% 38.3%       0 3.88G 2.14G 0|0 1|42  69.0k    126k  122 rs0  PRI Jun  7 07:31:13.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     18     *0      25    62|0  1.6% 38.3%       0 3.88G 2.14G 0|0 1|43  59.6k    116k  122 rs0  PRI Jun  7 07:31:14.488&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;    *0    *0     16     *0      20    46|0  1.8% 38.3%       0 3.89G 2.18G 0|0 1|44  46.1k    104k  122 rs0  PRI Jun  7 07:31:15.487&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;
&lt;p&gt;&#160;Active writes go from 0 to 40-50, which possibly also matches the number of increased processes.&lt;/p&gt;

&lt;p&gt;Further evidence to suggest the increased write load is expected is that we are concurrently writing the same data to another cluster which is also showing increased load on the same collection at matching times (an older one that is currently doing everything in our infrastructure, and something we are trying to break down into loads of use case specific clusters).&lt;/p&gt;

&lt;p&gt;This may suggest that issue #1 referred to above is a consequence of our use case, which just leaves issue #2 as the unexpected one.&lt;/p&gt;

&lt;p&gt;At every event I continue to add further logging so if issue #1 occurs again I will hopefully be able to show conclusively it&apos;s expected.&lt;/p&gt;

&lt;p&gt;Kind regards,&lt;/p&gt;

&lt;p&gt;Marc&lt;/p&gt;</comment>
                            <comment id="1913580" author="dmitry.agranat" created="Thu, 7 Jun 2018 13:39:27 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=MarcF&quot; class=&quot;user-hover&quot; rel=&quot;MarcF&quot;&gt;MarcF&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Thank you for providing all the information, it was very useful.&lt;/p&gt;

&lt;p&gt;In regards to the event #1 (100% CPU). Based on the provided information, these events are periodic and have a very common behavior: there is a burst (during the last event on June 7th this lasted ~30 minutes) of 100% user CPU utilization correlated with an increase of system processes running (from 1-2 to 50-60):&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/188868/188868_CPU.png&quot; width=&quot;100%&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Since we do not have the information about the CPU Credits on your instance at this time, we do not know if your sudden spike in CPU usage is a result of being throttled by Amazon, a sudden burst of work that was previously throttled by Amazon or by an actual increase in the workload.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;the load on this cluster should typically remain constant&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Could you provide some input in regards to the bursts of the increased number of processes running? Is this expected based on your workload?&lt;/p&gt;

&lt;p&gt;If these bursts of the increased number of running processes is unexpected, as an experiment, are you able to perform the same workload on fixed performance instances with more than 2 vCPUs instead of t2 burstable instances?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Further to my update I believe there are two unexpected things happening here:&lt;/p&gt;

&lt;p&gt;1) The high CPU and memory usage that causes the primary failover (the load on this cluster should typically remain constant).&lt;br/&gt;
2) The inability for mongo to recover after the failover.&lt;/p&gt;&lt;/blockquote&gt;
&lt;p&gt;We can look into the second concern once we understand the reason behind the reported CPU bursts.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Dima&lt;/p&gt;
</comment>
                            <comment id="1913446" author="marcf" created="Thu, 7 Jun 2018 08:56:35 +0000"  >&lt;p&gt;Further to my update I believe there are two unexpected things happening here:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;The high CPU and memory usage that causes the primary failover (the load on this cluster should typically remain constant).&lt;/li&gt;
	&lt;li&gt;The inability for mongo to recover after the failover.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;With a view to provide further debug information I increased the default primary (RS0-0) from t2.medium to a t2.large instance last night. At the time of writing we have just experienced issue #1 again (between 07:25 and 08:35 UTC, see graph) and because of the increased primary instance size it didn&apos;t crash. I have included the additional information that may help determine the cause of issue #1:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;mongod.log_RS0-0_2018-06-07.zip (secure file store) - The mongod.log file just after the high CPU period.&lt;/li&gt;
	&lt;li&gt;diagnostic.data_RS0-0_2018-06-07.zip&#160;(secure file store) - The diagnostic data to match the mongod.log file.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Replica set metrics relating to the above logs:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/188796/188796_image-2018-06-07-09-53-02-151.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;EBS /var/mongodata (100GB&#160;gp2) metrics relating to the above logs (which doesn&apos;t show any associated changes with the high CPU and memory):&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/188797/188797_image-2018-06-07-09-55-51-824.png&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Please let me know if any further information might be useful.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;

&lt;p&gt;Regards,&lt;/p&gt;

&lt;p&gt;Marc&lt;/p&gt;</comment>
                            <comment id="1911470" author="marcf" created="Tue, 5 Jun 2018 17:51:56 +0000"  >&lt;p&gt;Many thanks for the quick response. Please find the requested information as follows.&lt;/p&gt;

&lt;p&gt;For references all instances start off being t2.medium. EBS volumes are not initialised from snapshots so I don&#8217;t expect prewarming to be an issue here.&lt;/p&gt;

&lt;p&gt;Here are the top level system stats across the replica set during the period of interest (starting 2018-06-03 14:00 UTC):&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;a id=&quot;188607_thumb&quot; href=&quot;https://jira.mongodb.org/secure/attachment/188607/188607_top_level_replicaset_stats.JPG&quot; title=&quot;top_level_replicaset_stats.JPG&quot; file-preview-type=&quot;image&quot; file-preview-id=&quot;188607&quot; file-preview-title=&quot;top_level_replicaset_stats.JPG&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/thumbnail/188607/_thumb_188607.png&quot; style=&quot;border: 0px solid black&quot; role=&quot;presentation&quot;/&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;All of the requested data came to ~500MB so have been uploaded to the secure storage:&lt;/p&gt;

&lt;p&gt;diagnostic.data_RS0-0.zip - The diagnostic data from RS0-0&lt;br/&gt;
 diagnostic.data_RS0-1.zip - The diagnostic data from RS0-1&lt;br/&gt;
 diagnostic.data_RS0-2.zip - The diagnostic data from RS0-2&lt;br/&gt;
 mongod.log_RS0-0.zip - The mongod.log file for RS0-0. The full timespan discussed in the timeline was enclosed by a single file.&lt;br/&gt;
 mongod.log_RS0-1.zip - The mongod.log file for RS0-1. The full timespan discussed in the timeline was enclosed by a single file.&lt;br/&gt;
 mongod.log_RS0-2.zip - The mongod.log files for RS0-2. The full timespan discussed in the timeline required multiple files to be enclosed in a single zip file.&lt;br/&gt;
 df_RS0-0_at_2018-06-05_17-19-UTC.log - The file contents of /var/mongodata on RS0-0 at 17:19 UTC 5th July.&lt;br/&gt;
 mongostat_RS0-0.log - The mongostat log covering the period on Sunday 3rd.&lt;br/&gt;
 mongotop_RS0-0.log - The mongotop log covering the period on Sunday 3rd.&lt;br/&gt;
 rs.status_at_2018-06-05_16-20-UTC.log - The rs.status output at 16:20 UTC 5th July.&lt;/p&gt;
&lt;h1&gt;&lt;a name=&quot;Timeline%3A&quot;&gt;&lt;/a&gt;Timeline:&lt;/h1&gt;

&lt;p&gt;All times are UTC.&lt;/p&gt;
&lt;h2&gt;&lt;a name=&quot;Sunday3rd&quot;&gt;&lt;/a&gt;Sunday 3rd&lt;/h2&gt;

&lt;p&gt;&lt;b&gt;RS0-0&lt;/b&gt;&lt;br/&gt;
 21:16 - CPU usage hit&apos;s 100%&lt;br/&gt;
 21:31 - RES memory usage jumps from a steady 1.6G to 2.9G (see attached mongostat_RS0-0.log)&lt;br/&gt;
 22:20 - CPU drops to 0 as instance crashes.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;RS0-1&lt;/b&gt;&lt;br/&gt;
 22:20 - CPU bumps upto 25% around the time RS0-0 crashes, presumably as this instance takes over being primary.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;RS0-2&lt;/b&gt;&lt;br/&gt;
 CPU remains flat throughout, I suspect as the replica-set has gone read only when RS0-1 subsequently crashes and no further writes occur.&lt;/p&gt;
&lt;h2&gt;&lt;a name=&quot;Tuesday5th&quot;&gt;&lt;/a&gt;Tuesday 5th&lt;/h2&gt;

&lt;p&gt;&lt;b&gt;16:20&lt;/b&gt;&lt;br/&gt;
 Mongo applications manually restarted on RS0-0 and RS0-1.&lt;br/&gt;
 RS0-1 becomes primary and is in sync with RS0-2 immediately. For reference RS0-0 has a higher member priority.&lt;br/&gt;
 RS0-0 is 41.96 hours behind the primary as is not syncing even though the oplog contains 83 hours of data&lt;br/&gt;
 See attached &lt;em&gt;rs.status_at_2018-06-05_16-20-UTC.log&lt;/em&gt; attachment taken from RS0-1.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;16:33&lt;/b&gt;&lt;br/&gt;
 RS0-0 crashes due to a memory allocation error, see logs.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;16:47&lt;/b&gt;&lt;br/&gt;
 For the purposes of recreating the issue I described initially with the high disk space usage I increased RS0-0 to a t2-xlarge instance.&lt;br/&gt;
 RS0-0 /var/mongodata size is 7.3G&lt;/p&gt;

&lt;p&gt;&lt;b&gt;16:58&lt;/b&gt;&lt;br/&gt;
 RS0-0 /var/mongodata size is 16.0G&lt;/p&gt;

&lt;p&gt;&lt;b&gt;17:02&lt;/b&gt;&lt;br/&gt;
 RS0-0 /var/mongodata is 24.0G&lt;/p&gt;

&lt;p&gt;&lt;b&gt;17:14&lt;/b&gt;&lt;br/&gt;
 RS0-0 /var/mongodata is 32G and continuing to grow (see attached df_RS0-0_at_2018-06-05_17-19-UTC.log for file sizes)&lt;br/&gt;
 I attempted to shutdown the mongod application on RS0-0, after waiting 4 minutes I hard killed it using kill -9&lt;br/&gt;
 Deleted all contents of /var/mongodata (after taking a copy of the diagnostic data) and restarted the instance as t2.medium so that it performs an initial sync.&lt;/p&gt;

&lt;p&gt;&lt;b&gt;17:23&lt;/b&gt;&lt;br/&gt;
 RS0-0 is restarted as t2.medium and beings an initial sync&lt;/p&gt;

&lt;p&gt;&lt;b&gt;17:25&lt;/b&gt;&lt;br/&gt;
 RS0-0 has completed the initial sync and become primary (due to the rs config priorities).&lt;/p&gt;</comment>
                            <comment id="1908072" author="bruce.lucas@10gen.com" created="Fri, 1 Jun 2018 15:16:25 +0000"  >&lt;p&gt;Hi Marc,&lt;/p&gt;

&lt;p&gt;In order for us to diagnose this, can you please&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;archive and upload the contents of diagnostic.data&lt;/li&gt;
	&lt;li&gt;compress and upload the complete mongod log files&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;from each node of a replica set that&apos;s recently experienced this issue. You can either attach to this ticket if the files are less than 150 MB, or upload to &lt;a href=&quot;https://10gen-httpsupload.s3.amazonaws.com/upload_forms/97603b63-291b-400d-ba45-88c80f5927f6.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;this secure private portal&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Also, please provide a timeline of the incident so we are sure to look at the right place in the data.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
 Bruce&lt;/p&gt;</comment>
                            <comment id="1908005" author="marcf" created="Fri, 1 Jun 2018 14:30:52 +0000"  >&lt;p&gt;Apologies but log1 should be split on the noformat tags and the comment:&lt;/p&gt;


&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;I left the secondary in this state for a few hours to see if it would eventually get there as I was observing high CPU and disk usage. After&#160;about two hours the following additional information was output in the log:&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10010">
                    <name>Duplicate</name>
                                            <outwardlinks description="duplicates">
                                        <issuelink>
            <issuekey id="542268">SERVER-34938</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="542299">SERVER-34941</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="585014">SERVER-36495</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="188868" name="CPU.png" size="122087" author="dmitry.agranat@mongodb.com" created="Thu, 7 Jun 2018 20:00:37 +0000"/>
                            <attachment id="188796" name="image-2018-06-07-09-53-02-151.png" size="156369" author="MarcF" created="Thu, 7 Jun 2018 08:53:03 +0000"/>
                            <attachment id="188797" name="image-2018-06-07-09-55-51-824.png" size="127949" author="MarcF" created="Thu, 7 Jun 2018 08:55:52 +0000"/>
                            <attachment id="188993" name="image-2018-06-08-17-14-04-212.png" size="50394" author="MarcF" created="Fri, 8 Jun 2018 16:14:05 +0000"/>
                            <attachment id="188364" name="log1.txt" size="13891" author="MarcF" created="Fri, 1 Jun 2018 14:26:00 +0000"/>
                            <attachment id="188363" name="log2.txt" size="3599" author="MarcF" created="Fri, 1 Jun 2018 14:28:36 +0000"/>
                            <attachment id="188362" name="log3.txt" size="44611" author="MarcF" created="Fri, 1 Jun 2018 14:29:19 +0000"/>
                            <attachment id="188607" name="top_level_replicaset_stats.JPG" size="71925" author="MarcF" created="Tue, 5 Jun 2018 17:49:34 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 1 Jun 2018 15:16:25 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 34 weeks, 1 day ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>backlog-server-pm</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 34 weeks, 1 day ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>bruce.lucas@mongodb.com</customfieldvalue>
            <customfieldvalue>dmitry.agranat@mongodb.com</customfieldvalue>
            <customfieldvalue>MarcF</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htzj9r:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|htqh5j:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Under provision the secondary so that it eventually runs out of memory and crashes.&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[bruce.lucas@mongodb.com]]></customfieldvalue>
        <customfieldvalue><![CDATA[dmitry.agranat@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|htz5j3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>