<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 04:51:50 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-39371] Mongo cluster got stuck at &quot;could not find member to sync from&quot;</title>
                <link>https://jira.mongodb.org/browse/SERVER-39371</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;I have a mongo cluster with 5 nodes. The primary somehow &quot;half-die&quot;. From outside, the cluster does not have primary. Inside cluster, some node still can talk to it for some time and later says connection failed. After secondary fail to connect to primary, it tries to choose another member to sync from and cannot. And it got stuck there until engineer got involved and force delete/promote.&lt;/p&gt;

&lt;p&gt;&lt;font color=&quot;#4c9aff&quot;&gt;Version&lt;/font&gt;: 3.4&lt;/p&gt;

&lt;p&gt;&lt;font color=&quot;#4c9aff&quot;&gt;Cluster size:&lt;/font&gt; 5 nodes&lt;/p&gt;

&lt;p&gt;&lt;font color=&quot;#4c9aff&quot;&gt;mongo config&lt;/font&gt;:&lt;/p&gt;

&lt;p&gt;systemLog:&lt;br/&gt;
 destination: file&lt;br/&gt;
 logAppend: true&lt;br/&gt;
 path: /var/log/mongodb/mongod.log&lt;br/&gt;
 quiet: true&lt;/p&gt;

&lt;p&gt;net:&lt;br/&gt;
 http:&lt;br/&gt;
 enabled: true&lt;/p&gt;

&lt;ol&gt;
	&lt;li&gt;how the process runs&lt;br/&gt;
processManagement:&lt;br/&gt;
 fork: true # fork and run in background&lt;br/&gt;
 pidFilePath: /var/run/mongodb/mongod.pid # location of pidfile&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;storage:&lt;br/&gt;
 engine: &apos;wiredTiger&apos;&lt;br/&gt;
 dbPath: /mnt/mongodb&lt;/p&gt;

&lt;p&gt;replication:&lt;br/&gt;
 replSetName: snapshot-catalog-mongo&lt;br/&gt;
 oplogSizeMB: 24000&lt;/p&gt;

&lt;p&gt;operationProfiling:&lt;br/&gt;
 slowOpThresholdMs: 10000&lt;br/&gt;
 mode: off&lt;/p&gt;

&lt;p&gt;&lt;font color=&quot;#4c9aff&quot;&gt;Run mongo daemon as a service. The service file is configured as follows&lt;/font&gt;:&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;Unit&amp;#93;&lt;/span&gt;&lt;br/&gt;
Description=MongoDB Database Server&lt;br/&gt;
After=network.target&lt;br/&gt;
Documentation=&lt;a href=&quot;https://docs.mongodb.org/manual&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://docs.mongodb.org/manual&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;Service&amp;#93;&lt;/span&gt;&lt;br/&gt;
User=pure&lt;br/&gt;
Group=pure&lt;br/&gt;
Environment=&quot;OPTIONS=-f /etc/mongodb.conf&quot;&lt;br/&gt;
ExecStartPre=/opt/pure/setup-mongo.sh&lt;br/&gt;
ExecStart=/usr/bin/mongod $OPTIONS&lt;br/&gt;
PermissionsStartOnly=true&lt;br/&gt;
PIDFile=/var/run/mongodb/mongod.pid&lt;br/&gt;
Type=forking&lt;/p&gt;
&lt;ol&gt;
	&lt;li&gt;file size&lt;br/&gt;
LimitFSIZE=infinity&lt;/li&gt;
	&lt;li&gt;cpu time&lt;br/&gt;
LimitCPU=infinity&lt;/li&gt;
	&lt;li&gt;virtual memory size&lt;br/&gt;
LimitAS=infinity&lt;/li&gt;
	&lt;li&gt;open files&lt;br/&gt;
LimitNOFILE=64000&lt;/li&gt;
	&lt;li&gt;processes/threads&lt;br/&gt;
LimitNPROC=64000&lt;/li&gt;
	&lt;li&gt;locked memory&lt;br/&gt;
LimitMEMLOCK=infinity&lt;/li&gt;
	&lt;li&gt;total threads (user+kernel)&lt;br/&gt;
TasksMax=infinity&lt;br/&gt;
TasksAccounting=false&lt;/li&gt;
	&lt;li&gt;Recommended limits for for mongod as specified in&lt;/li&gt;
	&lt;li&gt;&lt;a href=&quot;http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://docs.mongodb.org/manual/reference/ulimit/#recommended-settings&lt;/a&gt;&lt;br/&gt;
Restart=on-failure&lt;br/&gt;
RestartSec=5&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&lt;span class=&quot;error&quot;&gt;&amp;#91;Install&amp;#93;&lt;/span&gt;&lt;br/&gt;
WantedBy=multi-user.target&lt;/p&gt;

&lt;p&gt;Timeline:&lt;/p&gt;

&lt;p&gt;1. &lt;span class=&quot;error&quot;&gt;&amp;#91;2019-01-16 ~1am&amp;#93;&lt;/span&gt; Primary node goes down. All secondary nodes do not realize the fact and choose to stay as secondary, so the cluster cannot take any traffic&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://wiki.purestorage.com/download/attachments/74200408/image2019-2-4_9-5-14.png?version=1&amp;amp;modificationDate=1549299914941&amp;amp;api=v2&quot; height=&quot;250&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;From outside, the cluster lost primary 00:55AM&lt;/p&gt;

&lt;p&gt;Replication Lag between 1-2 am&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://wiki.purestorage.com/download/attachments/74200408/image2019-2-4_10-10-55.png?version=1&amp;amp;modificationDate=1549303855465&amp;amp;api=v2&quot; height=&quot;250&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;time window between 1-2&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://wiki.purestorage.com/download/attachments/74200408/image2019-2-4_10-59-52.png?version=1&amp;amp;modificationDate=1549306792166&amp;amp;api=v2&quot; height=&quot;250&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;However, inside mongo cluster, some nodes are still in sync. The last node which lose connection to primary is 01:20AM.&lt;/p&gt;

&lt;p&gt;&#160;&lt;/p&gt;
&lt;h1&gt;&lt;a name=&quot;&quot;&gt;&lt;/a&gt;&lt;b&gt;&lt;font color=&quot;#FF0000&quot;&gt;Primary node&apos;s last log before crashing&lt;/font&gt;&lt;/b&gt;&lt;/h1&gt;
&lt;p&gt;Last log before crashing&lt;br/&gt;
2019-01-16T08:50:20.412+0000 I NETWORK &lt;span class=&quot;error&quot;&gt;&amp;#91;conn391702&amp;#93;&lt;/span&gt; received client metadata from 10.132.205.155:50350 conn391702: { driver: &lt;/p&gt;
{ name: &quot;mongo-java-driver&quot;, version: &quot;unknown&quot; }
&lt;p&gt;, os: { type: &quot;Linux&quot;, name: &quot;Linux&quot;, architecture: &quot;amd64&quot;, version: &quot;4.15.0-1021-aws&quot; }, platform: &quot;Java/Oracle Corporation/1.8.0_191-8u191-b12-0ubuntu0.18.04.1-b12&quot; }&lt;br/&gt;
2019-01-16T08:50:25.410+0000 I NETWORK &lt;span class=&quot;error&quot;&gt;&amp;#91;conn391703&amp;#93;&lt;/span&gt; received client metadata from 10.132.211.83:54270 conn391703: { driver: &lt;/p&gt;
{ name: &quot;PyMongo&quot;, version: &quot;3.5.1&quot; }
&lt;p&gt;, os: { type: &quot;Linux&quot;, name: &quot;Ubuntu 16.04 xenial&quot;, architecture: &quot;x86_64&quot;, version: &quot;4.4.0-1072-aws&quot; }, platform: &quot;CPython 2.7.14.final.0&quot; }&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://wiki.purestorage.com/download/attachments/74200408/image2019-2-4_9-0-50.png?version=1&amp;amp;modificationDate=1549299650894&amp;amp;api=v2&quot; height=&quot;400&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Secondary nodes&apos; log are attached below&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/208021/208021_log.pdf&quot; title=&quot;log.pdf attached to SERVER-39371&quot;&gt;log.pdf&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;br/&gt;
&#160;&lt;br/&gt;
&#160;&lt;/p&gt;


&lt;p&gt;Question&lt;/p&gt;

&lt;p&gt;1.&#160;What&apos;s the reason that primary node crashed?&lt;/p&gt;

&lt;p&gt;It only primary a lot of binary codes &amp;lt;0x00&amp;gt; and that&apos;s it. What happened?&lt;/p&gt;

&lt;p&gt;2. What&apos;s the reason that secondary nodes cannot select a new primary?&lt;/p&gt;

&lt;p&gt;Secondaries keep saying it &quot;could not find a member to sync from&quot;. Is that cluster in a weird state that primary can still talk to secondary and secondary cannot talk to primary? So secondaries still can know primary is alive and they do not elect a new primary, but they cannot sync data from primary.&#160;&lt;/p&gt;</description>
                <environment></environment>
        <key id="683543">SERVER-39371</key>
            <summary>Mongo cluster got stuck at &quot;could not find member to sync from&quot;</summary>
                <type id="6" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14720&amp;avatarType=issuetype">Question</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="eric.sedor@mongodb.com">Eric Sedor</assignee>
                                    <reporter username="Wu">Hongkai Wu [X]</reporter>
                        <labels>
                    </labels>
                <created>Mon, 4 Feb 2019 19:37:25 +0000</created>
                <updated>Fri, 8 Feb 2019 20:52:51 +0000</updated>
                            <resolved>Fri, 8 Feb 2019 20:52:51 +0000</resolved>
                                                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>6</watches>
                                                                                                                <comments>
                            <comment id="2143004" author="eric.sedor" created="Fri, 8 Feb 2019 20:52:29 +0000"  >&lt;p&gt;You are most welcome &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=Wu&quot; class=&quot;user-hover&quot; rel=&quot;Wu&quot;&gt;Wu&lt;/a&gt;. If this happens again, any information about what happened with the disk will be helpful!&lt;/p&gt;</comment>
                            <comment id="2141290" author="wu" created="Thu, 7 Feb 2019 17:28:20 +0000"  >&lt;p&gt;Interesting. I cannot find related info in syslog or dmesg. I think I&apos;m happy to know this is some issue related to writing to nvme0n1. If next time I meet the same issue I give you more information. Is there any other things we can dig? If not I think we can close the ticket!&lt;/p&gt;

&lt;p&gt;Thanks again for your help!&lt;/p&gt;</comment>
                            <comment id="2139974" author="eric.sedor" created="Wed, 6 Feb 2019 17:56:18 +0000"  >&lt;p&gt;We are a little surprised to hear that given that write metrics out of the WiredTiger cache seem to correspond to writes to nvme1n1.&lt;/p&gt;

&lt;p&gt;However, what we are able to say is that it looks like the trouble corresponds to issues with nvme0n1. Can you look into the syslog and dmesg for the time period to see if anything stands out?&lt;/p&gt;

&lt;p&gt; &lt;span class=&quot;image-wrap&quot; style=&quot;&quot;&gt;&lt;img src=&quot;https://jira.mongodb.org/secure/attachment/208291/208291_Screen+Shot+2019-02-06+at+9.59.07+AM.png&quot; width=&quot;100%&quot; style=&quot;border: 0px solid black&quot; /&gt;&lt;/span&gt; &lt;/p&gt;</comment>
                            <comment id="2139233" author="wu" created="Wed, 6 Feb 2019 06:50:08 +0000"  >&lt;p&gt;Thanks for taking a look, Eric. All the data files, index, etc are on that volume. We set the storage&apos;s dbpath to /mnt.&#160;nvme0n1 is mounted at /mnt.&lt;/p&gt;</comment>
                            <comment id="2139061" author="eric.sedor" created="Wed, 6 Feb 2019 00:01:05 +0000"  >&lt;p&gt;Thank you Kyle! Can you let us know what is on nvme0n1? We&apos;re curious because our initial look at the diagnostic.data suggests significant utilization of that volume.&lt;/p&gt;</comment>
                            <comment id="2137654" author="wu" created="Tue, 5 Feb 2019 01:41:16 +0000"  >&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/208064/208064_diagnostic.data.10.132.203.84.zip&quot; title=&quot;diagnostic.data.10.132.203.84.zip attached to SERVER-39371&quot;&gt;diagnostic.data.10.132.203.84.zip&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;span class=&quot;nobr&quot;&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/attachment/208065/208065_mongod.10.132.203.84.log.zip&quot; title=&quot;mongod.10.132.203.84.log.zip attached to SERVER-39371&quot;&gt;mongod.10.132.203.84.log.zip&lt;sup&gt;&lt;img class=&quot;rendericon&quot; src=&quot;https://jira.mongodb.org/images/icons/link_attachment_7.gif&quot; height=&quot;7&quot; width=&quot;7&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/sup&gt;&lt;/a&gt;&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;Hi @Eric, thanks for your reply. Unfortunately I lost all that day&apos;s diagnostics.data in secondary nodes. But I do find diagnostics.data on the crashed primary node. I attached it and mongod log above.&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;&lt;sup&gt;Kyle&lt;/sup&gt;&lt;/p&gt;</comment>
                            <comment id="2137226" author="eric.sedor" created="Mon, 4 Feb 2019 20:13:37 +0000"  >&lt;p&gt;Hello and thanks for the detail so far.&lt;/p&gt;

&lt;p&gt;Would you please archive (tar or zip) the &lt;tt&gt;$dbpath/diagnostic.data&lt;/tt&gt; directories for all 5 nodes and attach them to this ticket?&lt;/p&gt;

&lt;p&gt;As well, can you provide the logs of the crash from the Primary?&lt;/p&gt;

&lt;p&gt;Thank you!&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="208291" name="Screen Shot 2019-02-06 at 9.59.07 AM.png" size="133568" author="eric.sedor@mongodb.com" created="Wed, 6 Feb 2019 17:59:31 +0000"/>
                            <attachment id="208064" name="diagnostic.data.10.132.203.84.zip" size="38941072" author="Wu" created="Tue, 5 Feb 2019 01:39:00 +0000"/>
                            <attachment id="208021" name="log.pdf" size="146638" author="Wu" created="Mon, 4 Feb 2019 19:37:23 +0000"/>
                            <attachment id="208065" name="mongod.10.132.203.84.log.zip" size="8758744" author="Wu" created="Tue, 5 Feb 2019 01:39:52 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>7.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 4 Feb 2019 20:13:37 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        5 years, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>eric.sedor@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            5 years, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>eric.sedor@mongodb.com</customfieldvalue>
            <customfieldvalue>Wu</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hul6vj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hub1p3:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                    <customfieldvalue><![CDATA[eric.sedor@mongodb.com]]></customfieldvalue>
    

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hukt4v:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>