<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:06:20 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-24429] Initial sync from a WiredTiger instance locks the server</title>
                <link>https://jira.mongodb.org/browse/SERVER-24429</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;We&apos;re currently busy migrating all our servers to WiredTiger on MongoDB 3.0.12, and are running into performance / lock issues with the initial sync from a WT server.&lt;/p&gt;

&lt;p&gt;Our setup:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;A replicaset with 4 members (1 hidden, 1 with no votes).&lt;/li&gt;
	&lt;li&gt;All running MongoDB 3.0.12.&lt;/li&gt;
	&lt;li&gt;Database size is over 1TB (MMAP), around 300GB with WT.&lt;/li&gt;
	&lt;li&gt;No sharding.&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;We&apos;ve found that performing the initial sync from a MMAP server to a new WT server completes without issue. However, the initial sync from a WT server to another WT server gives significant performance issues.&lt;/p&gt;

&lt;p&gt;The issue seems to be specifically when it&apos;s doing a sync on large collections (around 100GB in size). After a while the server being synced from becomes completely unresponsive for multiple hours. During this time the replication lag on the server builds up if it&apos;s a secondary (it doesn&apos;t seem to be replicating at all anymore), most queries to it are completely unresponsive, and it&apos;s often not even possible to log into the mongo shell. It seems like the initial sync query is holding a global lock, and not yielding for a very long time. During this time there is also essentially no network traffic on the server.&lt;/p&gt;

&lt;p&gt;Since most of our application does not actively use the secondaries (which we are performing the initial sync from), it does not affect the majority of our system. However, there are a few queries that we do run on our secondaries, which are affected by this.&lt;/p&gt;

&lt;p&gt;When doing the initial sync from a MMAP server, we do not experience these issues at all.&lt;/p&gt;

&lt;p&gt;We have not tested with MongoDB 3.2.x yet.&lt;/p&gt;

</description>
                <environment></environment>
        <key id="291980">SERVER-24429</key>
            <summary>Initial sync from a WiredTiger instance locks the server</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="ramon.fernandez@mongodb.com">Ramon Fernandez Marina</assignee>
                                    <reporter username="ralf.kistner@gmail.com">Ralf Kistner</reporter>
                        <labels>
                    </labels>
                <created>Tue, 7 Jun 2016 09:14:47 +0000</created>
                <updated>Sat, 15 Oct 2016 02:03:41 +0000</updated>
                            <resolved>Sat, 15 Oct 2016 02:03:41 +0000</resolved>
                                    <version>3.0.12</version>
                                                    <component>WiredTiger</component>
                                        <votes>0</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="1409375" author="ramon.fernandez" created="Sat, 15 Oct 2016 02:03:31 +0000"  >&lt;p&gt;Thanks for the update &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ralf.kistner%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;ralf.kistner@gmail.com&quot;&gt;ralf.kistner@gmail.com&lt;/a&gt;, glad to hear that 3.2.10 solves your issue.&lt;/p&gt;

&lt;p&gt;The team has been working hard chasing down eviction-related issues in recent versions, and I&apos;m happy to report those have been fixed in 3.2.10 &amp;#8211; you can &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-26055?focusedCommentId=1394968&amp;amp;page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-1394968&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;read the details in this comment&lt;/a&gt;. Sorry your setup was affected, and thanks for your patience while we investigated and fixed these.&lt;/p&gt;

&lt;p&gt;I&apos;m going to close this ticket now. If you find any other issues please feel free to open a new SERVER ticket.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;
</comment>
                            <comment id="1399095" author="ralf.kistner@gmail.com" created="Mon, 3 Oct 2016 19:05:32 +0000"  >&lt;p&gt;Seems to be related to &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-26055&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/SERVER-26055&lt;/a&gt;. The issue was in general caused by queries filling up the cache, which always happened when replicating, as well as some other cases. We had the same (or worse) issues with 3.2.9, but 3.2.10 seems to resolve it.&lt;/p&gt;</comment>
                            <comment id="1287744" author="ralf.kistner@gmail.com" created="Wed, 8 Jun 2016 13:16:12 +0000"  >&lt;p&gt;We managed to sync successfully from both primary and secondary MMAPv1 nodes as source for the initial sync. We&apos;re also having trouble with both primary and secondary WT nodes as source.&lt;/p&gt;

&lt;p&gt;We also have three other similar replicasets in other datacenters that did not show the issue. The main differences are in the volume of data (the others have less), and the speed of the disks (others have SSD&apos;s, the replicaset with issues is still using spinning disks).&lt;/p&gt;

&lt;p&gt;We&apos;ll need to do some work before we are able to upgrade to MongoDB 3.2 (mostly lots of testing to make sure everything is compatible), so we&apos;re not able to do that immediately, but I&apos;ll let you know when we do.&lt;/p&gt;</comment>
                            <comment id="1287734" author="ramon.fernandez" created="Wed, 8 Jun 2016 13:10:02 +0000"  >&lt;p&gt;Thanks for the additional information. Can you specify &lt;em&gt;what kind of node (primary, secondary) was the MMAPv1 node you successfully synced from&lt;/em&gt;? I&apos;d like to use that as a control in a reproduction attempt...&lt;/p&gt;

&lt;p&gt;Please note that, in my experience, some times an issue is triggered by a specific data distribution, so even if everything &quot;looks right&quot; in my reproducer I may not trigger the same behavior. Since you appear to be stuck because if this issue I&apos;d recommend you consider the following way forward:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;Upgrade a node to MongoDB 3.2.7 with WT&lt;/li&gt;
	&lt;li&gt;Sync that node from an MMAPv1 node&lt;/li&gt;
	&lt;li&gt;Upgrade a second node to MongoDB 3.2.7 and sync it from the WT node&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;There was a lot of work done in the WiredTiger engine for 3.2, and we&apos;ve seen much more stable and performant behavior so far. However if the problem persists, these 3.2 nodes will have recorded diagnostic data that will make troubleshooting a lot easier. If you&apos;d like to explore this route please let us know how it goes.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;
</comment>
                            <comment id="1287726" author="ralf.kistner@gmail.com" created="Wed, 8 Jun 2016 12:51:39 +0000"  >&lt;p&gt;The &lt;b&gt;source&lt;/b&gt; node is mostly unresponsive, and shows the replicating lag (the source is a secondary, and it doesn&apos;t seem to be able to replicate from the primary anymore). No issues on the sink node. The logs I attached are all from the source node.&lt;/p&gt;

&lt;p&gt;It&apos;s been unresponsive for the last 30 hours or so, but we&apos;re waiting a little longer to see if the replication can complete.&lt;/p&gt;
</comment>
                            <comment id="1287705" author="ramon.fernandez" created="Wed, 8 Jun 2016 12:28:50 +0000"  >&lt;p&gt;Thanks for the logs &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ralf.kistner%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;ralf.kistner@gmail.com&quot;&gt;ralf.kistner@gmail.com&lt;/a&gt;. Unfortunately they don&apos;t contain any useful information, since they don&apos;t include data from the beginning of an initial sync.&lt;/p&gt;

&lt;p&gt;I want to confirm that the issue you&apos;re seeing is that, when a WT node initial syncs from another WT node:&lt;/p&gt;
&lt;ul&gt;
	&lt;li&gt;the &lt;b&gt;source&lt;/b&gt; node for the sync becomes unresponsive&lt;/li&gt;
	&lt;li&gt;the &lt;b&gt;sink&lt;/b&gt; node shows increased replication lag&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;Is this correct?&lt;/p&gt;

&lt;p&gt;But when the source node for the initial sync is running MMAPv1 this problem doesn&apos;t appear. The question is, is that MMAPv1 node a primary or also a secondary? I&apos;m trying to see if the issue is related to the storage engine or to the state of the source node in the replica set.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1286636" author="ralf.kistner@gmail.com" created="Tue, 7 Jun 2016 16:08:46 +0000"  >&lt;p&gt;Thanks for the response.&lt;/p&gt;

&lt;p&gt;I attached the logs for the last 10 minutes or so. The server seems to be locked for the last 8 hours, so I don&apos;t have any stats from before that.&lt;/p&gt;

&lt;p&gt;I also included the output of currentOp(). These are a few hundred ops, all of which seem to be waiting for the lock for hours.&lt;/p&gt;

&lt;p&gt;I also noticed that I am able to log in with the mongo shell (and get these stats) using our monitoring user, but not our &quot;root&quot; user.&lt;/p&gt;</comment>
                            <comment id="1286234" author="ramon.fernandez" created="Tue, 7 Jun 2016 11:26:18 +0000"  >&lt;p&gt;Thanks for your report &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=ralf.kistner%40gmail.com&quot; class=&quot;user-hover&quot; rel=&quot;ralf.kistner@gmail.com&quot;&gt;ralf.kistner@gmail.com&lt;/a&gt;; the behavior you describe is not expected and hasn&apos;t been reported before, so we&apos;ll need to gather more information to investigate. There are two options:&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;Collect diagnostic information on the WT node that locks while it happens; this can be done by running the following two lines from a bash shell (assuming you&apos;re using Linux):
&lt;p/&gt;
&lt;div id=&quot;syntaxplugin&quot; class=&quot;syntaxplugin&quot; style=&quot;border: 1px dashed #bbb; border-radius: 5px !important; overflow: auto; max-height: 30em;&quot;&gt;
&lt;table cellspacing=&quot;0&quot; cellpadding=&quot;0&quot; border=&quot;0&quot; width=&quot;100%&quot; style=&quot;font-size: 1em; line-height: 1.4em !important; font-weight: normal; font-style: normal; color: black;&quot;&gt;
		&lt;tbody &gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;  margin-top: 10px;   width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;mongo --eval &quot;while(true) {print(JSON.stringify(db.serverStatus({tcmalloc:1}))); sleep(1000)}&quot; &amp;gt;ss.log &amp;amp;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
				&lt;tr id=&quot;syntaxplugin_code_and_gutter&quot;&gt;
						&lt;td  style=&quot; line-height: 1.4em !important; padding: 0em; vertical-align: top;&quot;&gt;
					&lt;pre style=&quot;font-size: 1em; margin: 0 10px;   margin-bottom: 10px;  width: auto; padding: 0;&quot;&gt;&lt;span style=&quot;color: black; font-family: &apos;Consolas&apos;, &apos;Bitstream Vera Sans Mono&apos;, &apos;Courier New&apos;, Courier, monospace !important;&quot;&gt;iostat -k -t -x 1 &amp;gt;iostat.log &amp;amp;&lt;/span&gt;&lt;/pre&gt;
			&lt;/td&gt;
		&lt;/tr&gt;
			&lt;/tbody&gt;
&lt;/table&gt;
&lt;/div&gt;
&lt;p/&gt;
&lt;p&gt;You&apos;d need to start the data collection process on the node used as a source for initial sync, then run the initial sync from another node, and continue to collect this information for 5-10 minutes after the source node becomes blocked. This will create two files, &lt;tt&gt;iostat.log&lt;/tt&gt; and &lt;tt&gt;ss.log&lt;/tt&gt;, that you&apos;d need to upload here.&lt;/p&gt;&lt;/li&gt;
	&lt;li&gt;Upgrade to 3.2.6 (3.2.7 is expected before noon EDT if you can wait) and try again. In MongoDB 3.2 the diagnostic data is collected automatically, so if the problem persists you&apos;d need to upload the contents of the &lt;tt&gt;diagnostic.data&lt;/tt&gt; directory inside the &lt;tt&gt;dbpath&lt;/tt&gt; of the affected node.&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;In the meantime, if you have logs for the two nodes involved in the initial sync please upload them and we&apos;ll start looking for clues.&lt;/p&gt;

&lt;p&gt;Thanks,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;
</comment>
                    </comments>
                    <attachments>
                            <attachment id="125163" name="logs.zip" size="96770" author="ralf.kistner@gmail.com" created="Tue, 7 Jun 2016 16:05:34 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>8.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Tue, 7 Jun 2016 11:26:18 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        7 years, 17 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            7 years, 17 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>ralf.kistner@gmail.com</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrk6dj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hsm7fb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlhvb:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>