<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:02:08 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-23027] Unrecovering replication delay and crashing of server</title>
                <link>https://jira.mongodb.org/browse/SERVER-23027</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We&apos;ve been having a set of intermittent issues while performing some upgrades to our cluster. I have reproduced it so not filing as a bug yet.&lt;/p&gt;

&lt;p&gt;We&apos;re a) converting a standalone mongo instance into a replica set in phases, b) upgrading the to bigger AWS instances with higher disk IOPS, and c) using mongo 3.2.3 for the new instances (the initial standalone instance is at 3.0.8).&lt;/p&gt;

&lt;p&gt;There are 5 instances in total which include the old primary, 3 new secondaries and 1 arbiter. They are all running on WiredTiger.&lt;/p&gt;

&lt;p&gt;There are some properties of the cluster that are worth noting.&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;There is no compression being used.&lt;/li&gt;
	&lt;li&gt;There are about 1100 collections in one of the databases.&lt;/li&gt;
	&lt;li&gt;The old primary has a higher priority than the others - in order to try and ensure it remains the primary until all the clients are phased over.&lt;/li&gt;
	&lt;li&gt;The oplog on the PRIMARY instance is configured to be 40GB - increased ~10x from the initial value.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We&apos;re noticing that SECONDARIES are getting into a state of increasing replication delay.&lt;span class=&quot;error&quot;&gt;&amp;#91;see server-status-slow file for logs around this time&amp;#93;&lt;/span&gt; And after several hours of replication delays - one of the secondaries simply crashed. ~Around this time, we were performing fairly heavy writes on the PRIMARY. The disk read &quot;IOPS&quot; on the primary as reported by AWS was 1000 IOPS, with the max being 1500 IOPS. And writes were at a ~500 IOPS.&lt;/p&gt;

&lt;p&gt;In one case  (ip-10-0-0-233), the &quot;fixed&quot; the replication delay by restarting the server. The replication delay immediately dropped to 0. &lt;span class=&quot;error&quot;&gt;&amp;#91;see replication-delay-drop image&amp;#93;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;In another secondary, restarting did not fix the replication delay, it was not able to find a server from which it could replicate safely. The log message contained &lt;br/&gt;
&quot;2016-03-09T05:29:31.103+0000 W REPL     &lt;span class=&quot;error&quot;&gt;&amp;#91;rsBackgroundSync&amp;#93;&lt;/span&gt; we are too stale to use ip-10-0-0-233:27017 as a sync source&lt;br/&gt;
2016-03-09T05:29:31.103+0000 E REPL     &lt;span class=&quot;error&quot;&gt;&amp;#91;rsBackgroundSync&amp;#93;&lt;/span&gt; too stale to catch up &amp;#8211; entering maintenance mode&quot;&lt;/p&gt;

&lt;p&gt;We were never able to recover the crashed secondary. Every restart of the server resulted in it crashing again with a message that looked like this:&lt;br/&gt;
&quot;Assertion: 10334:BSONObj size: 17646640 (0x10D4430) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 301015268469&quot;&lt;/p&gt;

&lt;p&gt;This is impeding important operational tasks we need to do, so we&apos;d really like some insight as to what could have caused this.&lt;/p&gt;

&lt;p&gt;Let me know if there is any other information I can provide that would be useful. &lt;/p&gt;

&lt;p&gt;Unfortunately, I don&apos;t have the logs for the crashed mongo instance. I can attach logs for the other instance. That said the same issue happened a few days ago on another instance and if necessary I might be able to dig that up.&lt;/p&gt;</description>
                <environment></environment>
        <key id="271220">SERVER-23027</key>
            <summary>Unrecovering replication delay and crashing of server</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="4">Incomplete</resolution>
                                        <assignee username="kelsey.schubert@mongodb.com">Kelsey Schubert</assignee>
                                    <reporter username="varun@x.ai">Varun Vijayaraghavan</reporter>
                        <labels>
                    </labels>
                <created>Wed, 9 Mar 2016 19:36:57 +0000</created>
                <updated>Fri, 6 May 2016 18:28:13 +0000</updated>
                            <resolved>Fri, 6 May 2016 18:28:13 +0000</resolved>
                                    <version>3.2.3</version>
                                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="1257999" author="thomas.schubert" created="Fri, 6 May 2016 18:28:13 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=varun%40x.ai&quot; class=&quot;user-hover&quot; rel=&quot;varun@x.ai&quot;&gt;varun@x.ai&lt;/a&gt;, &lt;/p&gt;

&lt;p&gt;We haven&#8217;t heard back from you for some time, so I&#8217;m&lt;br/&gt;
going to mark this ticket as resolved. If this is still an issue for&lt;br/&gt;
you, please provide additional&lt;br/&gt;
information and we will reopen the ticket.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Thomas&lt;/p&gt;</comment>
                            <comment id="1232025" author="thomas.schubert" created="Mon, 11 Apr 2016 15:48:06 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=varun%40x.ai&quot; class=&quot;user-hover&quot; rel=&quot;varun@x.ai&quot;&gt;varun@x.ai&lt;/a&gt;, &lt;/p&gt;

&lt;p&gt;We still need additional information to diagnose the problem. If this is still an issue for you, can you please upload the diagnostic.data and logs for the affected nodes?&lt;/p&gt;

&lt;p&gt;Thank you,&lt;br/&gt;
Thomas&lt;/p&gt;</comment>
                            <comment id="1208449" author="thomas.schubert" created="Fri, 18 Mar 2016 20:53:20 +0000"  >&lt;p&gt;Hi &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=varun%40x.ai&quot; class=&quot;user-hover&quot; rel=&quot;varun@x.ai&quot;&gt;varun@x.ai&lt;/a&gt;,&lt;/p&gt;

&lt;p&gt;Sorry for the delay getting back to you. The behavior that you are describing is actually the result of two different issues.&lt;/p&gt;

&lt;p&gt;First, let us discuss the secondary which has the error message:&lt;/p&gt;

&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;&quot;Assertion: 10334:BSONObj size: 17646640 (0x10D4430) is invalid. Size must be between 0 and 16793600(16MB) First element: id: 301015268469&quot;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;

&lt;p&gt;This error message indicates that a document on the secondary has suffered disk corruption. Determining the exact cause of this corruption is generally not worthwhile. However, if data corruption issues persist, I would recommend a thorough integrity check of the affected node&apos;s disk drives.&lt;/p&gt;

&lt;p&gt;To address this issue, please execute a &lt;a href=&quot;https://docs.mongodb.org/manual/tutorial/resync-replica-set-member/#automatically-sync-a-member&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;clean resync&lt;/a&gt; on the affected node.&lt;/p&gt;

&lt;p&gt;Second, I would like to discuss the growing replication delay until server restart. To continue to investigate this issue, please answer the following questions:&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;Can you please upload the contents of &lt;tt&gt;diagnostic.data&lt;/tt&gt; directory as well as the logs for the affected node when it is experiencing this issue?&lt;/li&gt;
	&lt;li&gt;Which &lt;a href=&quot;https://docs.mongodb.org/manual/reference/replica-configuration/#rsconf.protocolVersion&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;replication protocol&lt;/a&gt; are you using?&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;Please also consider upgrading to 3.2.4 which contains fixes including &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-22276&quot; title=&quot;implement &amp;quot;j&amp;quot; flag in write concern apply to secondary as well as primary&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-22276&quot;&gt;&lt;del&gt;SERVER-22276&lt;/del&gt;&lt;/a&gt;. These fixes may improve the behavior that you are observing.&lt;/p&gt;

&lt;p&gt;Kind regards,&lt;br/&gt;
Thomas&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="112813" name="replication-delay-drop.png" size="15326" author="varun@x.ai" created="Wed, 9 Mar 2016 19:36:57 +0000"/>
                            <attachment id="112812" name="server-status-slow.txt" size="6429" author="varun@x.ai" created="Wed, 9 Mar 2016 19:36:57 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>3.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Wed, 9 Mar 2016 20:34:11 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        7 years, 40 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>kelsey.schubert@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            7 years, 40 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>kelsey.schubert@mongodb.com</customfieldvalue>
            <customfieldvalue>varun@x.ai</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrke7z:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hsj8bz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[kelsey.schubert@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrpav3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>