<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:36:36 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-14993] Mongos does not abort non-responsive update operations on primary change</title>
                <link>https://jira.mongodb.org/browse/SERVER-14993</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hi &lt;br/&gt;
During some lab testing to measuring the failover response time and behaviour, we created diferent scenarios:&lt;br/&gt;
1) Graceful failover (replSetStepDown)&lt;br/&gt;
2) Process aborted (killed externally in this simulation by task manager)&lt;br/&gt;
3) System outage (removing network cable)&lt;/p&gt;

&lt;p&gt;We were using a 2 node replica set with an extra arbitrer and a mongos on top of them (servers were started with --shardsvr and --replset). Mongos was running in a linux box, while the mongod were running in windows, all using 2.6.4.&lt;/p&gt;

&lt;p&gt;In scenarios 1 and 2, updates were aborted with errors, and started working once the new primary was elected.&lt;/p&gt;

&lt;p&gt;In the 3rd scenario the client gets stuck in the update command (using c# driver latest stable version). we could overcome the issue by setting the socketTimeoutMS option in the connection string, but in principle the preferred behaviour is the one shown in scenarios 1 and 2.&lt;/p&gt;

&lt;p&gt;Test has been repeted several times to avoid mistakes.&lt;/p&gt;


&lt;p&gt;As a side note times were: 3, 16 and 23 seconds, for the three scenarios.  &lt;/p&gt;</description>
                <environment></environment>
        <key id="154158">SERVER-14993</key>
            <summary>Mongos does not abort non-responsive update operations on primary change</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="backlog-server-sharding">[DO NOT USE] Backlog - Sharding Team</assignee>
                                    <reporter username="jlpedrosa@gmail.com">Jose Luis Pedrosa</reporter>
                        <labels>
                    </labels>
                <created>Thu, 21 Aug 2014 22:15:16 +0000</created>
                <updated>Tue, 6 Dec 2022 05:02:28 +0000</updated>
                            <resolved>Thu, 13 Jun 2019 15:46:14 +0000</resolved>
                                    <version>2.6.4</version>
                                                    <component>Sharding</component>
                                        <votes>0</votes>
                                    <watches>8</watches>
                                                                                                                <comments>
                            <comment id="708292" author="jlpedrosa@gmail.com" created="Sat, 30 Aug 2014 17:09:28 +0000"  >&lt;p&gt;Hi Greg&lt;/p&gt;

&lt;p&gt;I think that case you mention, is pretty extreme as the order of magnitude of response time for a conection (even over internet an thousands of kilometers) should be as much as few hundreds of milis, while the failover is few seconds in the best case scenario, in  the same vlan with latencies below ms. &lt;/p&gt;

&lt;p&gt;If the operation is already completed, it means that the primary in that moment did not even detected that he can&#180;t comunicate with the other memebers (otherwise the election process would kick in), and that would mean at least heartbeatTimeoutSecs after. And also the op could not be replicated to the other nodes, so it would be rolled back.&lt;/p&gt;

&lt;p&gt;Also if we think about the consecuences of aborting existing CRUD operations on new primary election, it would be that the application side would think as failed a success operation (that as I said I understand it will be rolled back once the comms are restablished because the replication). I think that possible issue is much better than leaving connections hang indefinitely.&lt;/p&gt;

&lt;p&gt;In my opinion, it should not be &quot;abort non-responsive&quot; it should be, abort CRUD operations when a new primary is elected in the replica set. &lt;/p&gt;

&lt;p&gt;BR&lt;/p&gt;
</comment>
                            <comment id="707972" author="greg_10gen" created="Fri, 29 Aug 2014 18:53:54 +0000"  >&lt;p&gt;&amp;gt; as even if they would reach the server it would not be primary anymore and should not be processed&lt;br/&gt;
This is I think is the core issue - the problem is that there may be write results from &lt;b&gt;already completed&lt;/b&gt; writes in-flight from the now-&quot;bad&quot; node.  If the latency from mongos to the &quot;bad&quot; primary node is high, and the latency to the new &quot;good&quot; primary is low, this could be a significant amount of data - mongos latency to replica sets is often much higher and more variable than between replica set members.&lt;/p&gt;

&lt;p&gt;I think this all really boils down to &quot;is the replica set timeout generally a good proxy for mongos to replica set timeout, or do we need additional configuration&quot;?&lt;/p&gt;</comment>
                            <comment id="707842" author="jlpedrosa@gmail.com" created="Fri, 29 Aug 2014 17:29:15 +0000"  >&lt;p&gt;@Greg,  I&#180;m not 100% sure if I am of following you. &lt;br/&gt;
I think the DBA should find the right value for the timeout (heartbeatTimeoutSecs: &amp;lt;int&amp;gt;) for marking as dead a given node depending on the installation. As you mention, cross data center over VPN, is not the same as 10Gbit ethernet on local lan. &lt;/p&gt;

&lt;p&gt;I think that all queries (without any special write concern), inserts/updates/deletes should be aborted when the new primary is elected, (Not when it&apos;s marked as un-healthy) as even if they would reach the server it would not be primary anymore and should not be processed. I agree that some commands may not be aborted, but I don&#180;t think that applies to CRUD ops.&lt;/p&gt;

&lt;p&gt;Do you agree?&lt;/p&gt;</comment>
                            <comment id="707796" author="greg_10gen" created="Fri, 29 Aug 2014 16:53:34 +0000"  >&lt;p&gt;Better handling of non-responsive nodes is definitely something we&apos;re looking to improve.&lt;/p&gt;

&lt;p&gt;It&apos;s not clear that aborting a write when a new primary is detected is always the correct behavior, however.  A sporadic network problem or unexpected load which may be delaying your response can often also trigger replica set failovers - especially across data centers, etc.  Without timing information there&apos;s (currently) no very general way to guess whether the operation has been lost or whether the replica set was simply faster to reconfigure and tell mongos than the write response arriving.&lt;/p&gt;

&lt;p&gt;Short-term, I think operation-level timeouts would best address this - in these tests it seems your writes are expected to complete and fail quickly with little network delay (because your network is assumed fast), and that information can be passed to mongos or a driver: &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-13622&quot; title=&quot;Time limit for bulk operations&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-13622&quot;&gt;&lt;del&gt;SERVER-13622&lt;/del&gt;&lt;/a&gt;.  Longer term we&apos;re looking at various protocol changes.&lt;/p&gt;</comment>
                            <comment id="707705" author="schwerin" created="Fri, 29 Aug 2014 15:37:37 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jlpedrosa&quot; class=&quot;user-hover&quot; rel=&quot;jlpedrosa&quot;&gt;jlpedrosa&lt;/a&gt;, someone on the sharding team will take care of transforming this ticket into the described feature request and triaging it.  Thanks.&lt;/p&gt;</comment>
                            <comment id="707699" author="jlpedrosa@gmail.com" created="Fri, 29 Aug 2014 15:23:59 +0000"  >&lt;p&gt;Ok,&lt;/p&gt;

&lt;p&gt;So we are aligned. So do you want me to open other ticket with your sugested name? we leave this one open? Do you need anything from me?&lt;/p&gt;

&lt;p&gt;Thanks for the kick answer.&lt;/p&gt;</comment>
                            <comment id="707682" author="schwerin" created="Fri, 29 Aug 2014 14:59:03 +0000"  >&lt;p&gt;Yeah, that&apos;s because when a host disappears from the network, other host OSes can&apos;t distinguish between slow network traffic and failure.  As a result, the OS never informs the MongoS that anything&apos;s wrong, because it doesn&apos;t know itself.  In the other forms of failure you simulated, the MongoD (test 1) or the OS on node D (test 2) know what&apos;s going on, and are able to transmit a message.  In case 1, it&apos;s the stepDown command.  In test 2, the OS on node D sends an explicit hang-up message to the MongoS&apos;s OS to end the TCP connection for the process that it knows died.&lt;/p&gt;</comment>
                            <comment id="707681" author="jlpedrosa@gmail.com" created="Fri, 29 Aug 2014 14:53:58 +0000"  >&lt;p&gt;Hi Andy&lt;/p&gt;

&lt;p&gt;Thanks! Yes is the inflights connections the one that get stuck (as I said just tunning the timeout fixes the issue, actually I keep on using the same db ad collection objects and keeps working). In this case I don&apos;t think the driver is involved, as I&apos;m behind a mongos. the client does not have any connection (tcp) to the replica set. &lt;/p&gt;

&lt;p&gt;When we kill the process, and the TCP socket is resetd by the OS running mongod, then mongos is smart enough to abort the operations in progress, but not if it is caused due to a failover that not involvers hard reset of the connections.&lt;/p&gt;

&lt;p&gt;Thanks again.&lt;/p&gt;

&lt;p&gt;Rgds&lt;/p&gt;

&lt;p&gt;JL&lt;/p&gt;</comment>
                            <comment id="707667" author="schwerin" created="Fri, 29 Aug 2014 14:44:01 +0000"  >&lt;p&gt;So the problem is that in Scenario 3, MongoS cannot distinguish between Box D being slow and it being down, until the connection timeout expires.  This is in some ways a TCP issue, but perhaps MongoS could more proactively monitor the replica set to detect the selection of the new primary.  The write(s) in progress when the network cable is removed would be abandoned by the MongoS when it realizes that node D is no longer primary (keep in mind that MongoS doesn&apos;t yet know that node D is down, because the TCP timeout hasn&apos;t expired).&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=greg_10gen&quot; class=&quot;user-hover&quot; rel=&quot;greg_10gen&quot;&gt;greg_10gen&lt;/a&gt;, I think this boils down to a feature request to have MongoS (and perhaps also drivers that talk to replica sets) be more proactive in monitoring for primary changes and perhaps to quickly report as failures writes that are blocked on a replica set node that is now believed not to be primary.  Something like the following: &quot;When MongoS detects that a shard has elected a new primary, it should abort write operations blocked on the old primary.&quot;&lt;/p&gt;

&lt;p&gt;Separately, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jlpedrosa&quot; class=&quot;user-hover&quot; rel=&quot;jlpedrosa&quot;&gt;jlpedrosa&lt;/a&gt;, if when the client is stuck in its update you fire up a &lt;em&gt;new&lt;/em&gt; client and connect to the same mongos to perform an update, does it also get stuck, or does it successfully contact the new primary?  I think the problem you&apos;re experiencing may only affect operations that are in flight when the cable to node D is unplugged.&lt;/p&gt;</comment>
                            <comment id="707551" author="jlpedrosa@gmail.com" created="Fri, 29 Aug 2014 11:48:16 +0000"  >&lt;p&gt;Hi Andy&lt;br/&gt;
Please find attached the schema with the machines and processes running in each box. In scenario 3, the cable unpluged (not logically, but phisically unplugged) is the one labeled as BOX D eth0.&lt;/p&gt;

&lt;p&gt;In the failovers we ensured that the mongod in BOX D was the primary.&lt;/p&gt;

&lt;p&gt;Best regards&lt;/p&gt;</comment>
                            <comment id="707408" author="schwerin" created="Fri, 29 Aug 2014 02:23:43 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=jlpedrosa&quot; class=&quot;user-hover&quot; rel=&quot;jlpedrosa&quot;&gt;jlpedrosa&lt;/a&gt;, could you clarify which network link or node is disabled in scenario 3?&lt;/p&gt;</comment>
                    </comments>
                    <attachments>
                            <attachment id="51033" name="failoverTestHostSchema.png" size="7836" author="jlpedrosa@gmail.com" created="Fri, 29 Aug 2014 11:48:16 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>11.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_12751" key="com.atlassian.jira.plugin.system.customfieldtypes:multiselect">
                        <customfieldname>Assigned Teams</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="25141"><![CDATA[Sharding]]></customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 29 Aug 2014 02:23:43 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        9 years, 24 weeks, 4 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>alexander.golin@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            9 years, 24 weeks, 4 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>backlog-server-sharding</customfieldvalue>
            <customfieldvalue>schwerin@mongodb.com</customfieldvalue>
            <customfieldvalue>greg_10gen</customfieldvalue>
            <customfieldvalue>jlpedrosa@gmail.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrlpe7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrfzyf:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>133855</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hsgphz:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>