<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Wed Feb 07 21:40:34 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[CSHARP-1748] Not catching certain error scenarios from replicaset members</title>
                <link>https://jira.mongodb.org/browse/CSHARP-1748</link>
                <project id="10041" key="CSHARP">C# Driver</project>
                    <description>&lt;p&gt;We had an issue today where, in a 3 member replicaset, when one of the secondaries becomes angry and starts erroring out on simple connectivity, that our entire site may go down in a 503 scenario.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-25663&quot; title=&quot;Odd connection timeouts and rejections when replicaset secondary is lagged&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-25663&quot;&gt;&lt;del&gt;SERVER-25663&lt;/del&gt;&lt;/a&gt; is the related issue. We are unable to reproduce, because the scenario in which the secondary got into its mess is uncertain.&lt;/p&gt;

&lt;p&gt;We are sure, however, that there must be some unhandled exception coming back in the client connectivity, causing pure failure of the site. It may be an unexpected network error.&lt;/p&gt;

&lt;p&gt;Simply shutting down the angry secondary immediately fixed the issue. When the angry secondary came back up, it syncd and all remained well.&lt;/p&gt;</description>
                <environment></environment>
        <key id="309730">CSHARP-1748</key>
            <summary>Not catching certain error scenarios from replicaset members</summary>
                <type id="3" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14718&amp;avatarType=issuetype">Task</type>
                                            <priority id="2" iconUrl="https://jira.mongodb.org/images/icons/priorities/critical.svg">Critical - P2</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="5">Cannot Reproduce</resolution>
                                        <assignee username="robert@mongodb.com">Robert Stam</assignee>
                                    <reporter username="sallgeud">Chad Kreimendahl</reporter>
                        <labels>
                            <label>question</label>
                    </labels>
                <created>Wed, 17 Aug 2016 17:33:30 +0000</created>
                <updated>Fri, 5 Apr 2019 13:59:07 +0000</updated>
                            <resolved>Thu, 11 Jan 2018 13:59:02 +0000</resolved>
                                    <version>2.2.4</version>
                                                    <component>Connectivity</component>
                                        <votes>0</votes>
                                    <watches>4</watches>
                                                                                                                <comments>
                            <comment id="1771327" author="rstam" created="Thu, 11 Jan 2018 13:59:02 +0000"  >&lt;p&gt;From the information provided it appears that this was as server issue and not a driver issue.&lt;/p&gt;</comment>
                            <comment id="1361214" author="sallgeud" created="Thu, 18 Aug 2016 18:12:43 +0000"  >&lt;p&gt;We did quite a bit more advanced research on this issue and believe it may be a case in which C# cannot effectively handle the problem.  The major issue here is obviously a problem with &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-25663&quot; title=&quot;Odd connection timeouts and rejections when replicaset secondary is lagged&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-25663&quot;&gt;&lt;del&gt;SERVER-25663&lt;/del&gt;&lt;/a&gt;.  Based on what I&apos;ll describe below, I&apos;m not currently sure it was something that could be handled.  (I&apos;ll add this to &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-25663&quot; title=&quot;Odd connection timeouts and rejections when replicaset secondary is lagged&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-25663&quot;&gt;&lt;del&gt;SERVER-25663&lt;/del&gt;&lt;/a&gt; as well)&lt;/p&gt;

&lt;p&gt;It appears that the slowness started approximately 20 minutes into a &quot;mongodump&quot; backup being performed on the secondary in question. When these mongodump processes run, they eat up every available ounce of memory, sometimes forcing mongod to use swap (vm.swapiness=1 because swap is bad but OOM is worse). Based on observations, their is either some form of memory leak in mongodump, or some highly unnecessary usage of memory. &lt;/p&gt;

&lt;p&gt;In this low memory situation, with some data in mongod swapping, we get enormously long queries. Finds that normally take 2-10ms begin to take between 5 and 100 seconds. It was in this scenario where the problem began. The SEND_ERROR we were seeing is likely the client side nuking the connection because it took too long.&lt;/p&gt;</comment>
                            <comment id="1360977" author="sallgeud" created="Thu, 18 Aug 2016 15:25:59 +0000"  >&lt;p&gt;Questions answered... follow up comment has some new findings&lt;/p&gt;

&lt;p&gt;1. Yes, reads are always happening from numerous processes, all the time. (as are writes... in this case, there was a specific batch job running against 1 collection &lt;span class=&quot;error&quot;&gt;&amp;#91;of thousands&amp;#93;&lt;/span&gt;)&lt;br/&gt;
2. 95% PrimaryPreferred, 5% SecondaryPreferred (by volume)&lt;br/&gt;
3. 1 Primary, 3 Secondaries. 1 of the secondaries is &quot;hidden&quot; and in our disaster recovery data center&lt;br/&gt;
4. Our problem was that we didn&apos;t get any stack traces, because any queries were resulting in abnormally long responses (see next comment). We use a technology that attempts to log all exceptions to Mongo, given that mongo has been our most reliable data store to this point. It also emails errors to a distro list.  All the emails are general w3wp failures.&lt;/p&gt;</comment>
                            <comment id="1360136" author="craiggwilson" created="Wed, 17 Aug 2016 20:32:32 +0000"  >&lt;p&gt;Thanks. Another couple questions:&lt;/p&gt;

&lt;p&gt;1. Was your site doing reads and the massive set of updates mentioned in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-25663&quot; title=&quot;Odd connection timeouts and rejections when replicaset secondary is lagged&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-25663&quot;&gt;&lt;del&gt;SERVER-25663&lt;/del&gt;&lt;/a&gt; were happening from another process?&lt;br/&gt;
2. What read-preference does your site use? &lt;br/&gt;
3. What is the configuration of your replica set? i.e., Do you have 1 primary, 1 secondary, and an arbiter? Two secondaries? etc...&lt;br/&gt;
4. What makes you think the driver wasn&apos;t catching exceptions? Could you provide some of the exceptions and stack traces you caught so we can see them?&lt;/p&gt;

&lt;p&gt;Thanks.&lt;/p&gt;</comment>
                            <comment id="1360121" author="sallgeud" created="Wed, 17 Aug 2016 20:23:29 +0000"  >&lt;p&gt;Without restarting the app or doing anything other than shutting down the service on the bad secondary. It worked within 5 seconds after.  We did have circumstances when things were just absurdly slow and timed out. However, we eventually began to get 503 errors, even though we&apos;re catching all exceptions with Application_Error code. &lt;/p&gt;</comment>
                            <comment id="1360071" author="craiggwilson" created="Wed, 17 Aug 2016 19:49:18 +0000"  >&lt;p&gt;That is certainly odd, and without being able to reproduce, that&apos;s going to make it that much harder to find.  In the low level connection code, we catch every exception. That being said, client code is still required to catch exceptions, which I assume you are already doing. &lt;/p&gt;

&lt;p&gt;You stated this: &quot;Simply shutting down the angry secondary immediately fixed the issue.&quot;.  Do you mean that without restarting the app or, doing anything at all, shutting down the secondary caused your app to begin working again? If so, when it wasn&apos;t working, are there exceptions you were seeing, or were things taking a long time? How did you know something was wrong?&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                                                <inwardlinks description="is related to">
                                        <issuelink>
            <issuekey id="309727">SERVER-25663</issuekey>
        </issuelink>
                            </inwardlinks>
                                    </issuelinktype>
                    </issuelinks>
                <attachments>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hr9ua7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            </customfields>
    </item>
</channel>
</rss>