<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:21:52 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-9946] Rollback may not happen to primary when resumed from sleep state</title>
                <link>https://jira.mongodb.org/browse/SERVER-9946</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;A replica set primary is stopped with Ctrl-Z. After it is resumed, it completes one final write operation, returns to the replica set claiming to be primary and claiming to have a more recent state of the oplog. This causes the current (real) primary to yield and to roll back everything it has written in the meantime.&lt;/p&gt;

&lt;p&gt;Proposed fix: The real primary node should be able to know that it has been elected with majorty support, and that writes accepted after that election are valid, and the returning primary should be rejected.&lt;/p&gt;</description>
                <environment>Ubuntu 12.04 on a Dell XPS13 laptop</environment>
        <key id="79292">SERVER-9946</key>
            <summary>Rollback may not happen to primary when resumed from sleep state</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="3">Duplicate</resolution>
                                        <assignee username="-1">Unassigned</assignee>
                                    <reporter username="henrik.ingo@mongodb.com">Henrik Ingo</reporter>
                        <labels>
                    </labels>
                <created>Mon, 17 Jun 2013 12:55:37 +0000</created>
                <updated>Wed, 10 Dec 2014 23:05:27 +0000</updated>
                            <resolved>Fri, 6 Dec 2013 23:50:35 +0000</resolved>
                                    <version>2.4.4</version>
                                                    <component>Replication</component>
                                        <votes>0</votes>
                                    <watches>7</watches>
                                                                                                                <comments>
                            <comment id="467375" author="eliot" created="Fri, 6 Dec 2013 23:50:35 +0000"  >&lt;p&gt;Same issue as &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-10768&quot; title=&quot;add proper support for SIGSTOP and SIGCONT (currently, on replica set primary can cause data loss)&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-10768&quot;&gt;&lt;del&gt;SERVER-10768&lt;/del&gt;&lt;/a&gt;&lt;/p&gt;</comment>
                            <comment id="361959" author="henrik.ingo@10gen.com" created="Mon, 17 Jun 2013 18:01:56 +0000"  >&lt;p&gt;Here&apos;s another log file with a slightly different sequence of events that still results in significant rollbacks.&lt;/p&gt;

&lt;p&gt;This is the same steps as above, except that node 1 (27001) was given priority 2.&lt;/p&gt;

&lt;p&gt;When it wakes up, it actually receives information from node 2 being the new primary before sending any heartbeats itself. Still, a stray insert has found it&apos;s way to its oplog and maybe due to the higher priority it now claims the primary role, again forcing node 2 (27002) to roll back everything it has committed while primary. (I can see 2 minutes missing with db.collectionname.find())&lt;/p&gt;</comment>
                            <comment id="361915" author="henrik.ingo@10gen.com" created="Mon, 17 Jun 2013 16:52:11 +0000"  >&lt;p&gt;I wanted to simulate a case were primary stops responding without triggering a close of the tcp/ip socket. Suspending the process was a way to do that, alternatively I could enclose wit a firewall that silently drops packets, and in real life of course this could happen for whatever network interruptions.&lt;/p&gt;

&lt;p&gt;@Scott: To reproduce it is of course necessary that primary is stopped exactly at a time when it has received an insert but not yet written it to an oplog. Then it should succeed in writing to oplog before it sends out the first post-resume heartbeat, so that the first heartbeat will include the new advanced oplog state / optime. So yes, it&apos;s very much a race, but I hope the attached log makes it clear what happened (maybe I was lucky).&lt;/p&gt;


&lt;p&gt;@Eric: I think the real bug is not whether primary was stopped with Ctrl-Z or something else. The problem is that the new primary (27002) is willing to rollback an arbitrarily long history of operations simply because a single node enters the replica set and erroneously claims that it should do so. At least from what I see here, there&apos;s nothing that would prevent losing hours of data due to a glitch like this:&lt;/p&gt;
&lt;ul class=&quot;alternate&quot; type=&quot;square&quot;&gt;
	&lt;li&gt;node 1 is primary at time T0&lt;/li&gt;
	&lt;li&gt;node 1 is lost&lt;/li&gt;
	&lt;li&gt;node 2 assumes primary at time T1&lt;/li&gt;
	&lt;li&gt;node 2 runs as primary for hours, committing millions of operations to oplog&lt;/li&gt;
	&lt;li&gt;node 1 wakes up, commits a single operation with a optime higher than anything on node 2&lt;/li&gt;
	&lt;li&gt;node 1 re-joins replica set, sends heartbeats to everyone else claiming to be primary and claiming to have the most up to date oplog&lt;/li&gt;
	&lt;li&gt;(note that the above 2 points could be due to any bug or malfunction, the point is that node 2 needs to be empowered to protect against these invalid claims)&lt;/li&gt;
	&lt;li&gt;election protocol favors the most recent oplog so node 1 wins and node 2 has to rollback everything that happened for as long as it was primary&lt;/li&gt;
&lt;/ul&gt;


&lt;p&gt;What should happen is that node 2 has recorded the fact that it won an election at time T1. Since node 1 claims to have committed operations for optime &amp;gt; T1, it should reject the claim of node 1 to be primary. Now node 2 and node 1 need to compare the times when they have been elected as primaries. node 1 will submit T0 and node 2 will submit T1. Seeing that T1 &amp;gt; T0 and it follows that anything node 2 has committed after T1 is valid and anything node 1 has committed after T1 is invalid. Hence node 1 needs to step down and roll back and node 2 may continue as primary.&lt;/p&gt;

&lt;p&gt;Note that instead of comparing T0 and T1 as a separate step, more natural would be to add that information to the current heartbeat ping sent by the primary so that this comparison can be done immediately as part of the existing algorithm.&lt;/p&gt;

&lt;p&gt;Note that with our election protocol it is never possible (except for short overlapping times of less than 2 seconds, but even this seems to be guarded against) for more than 1 node to actually win majority support in an election, so referring back to election results is a good way to decide between competing claims for being in primary state at a given point in time.&lt;/p&gt;</comment>
                            <comment id="361716" author="milkie" created="Mon, 17 Jun 2013 13:10:11 +0000"  >&lt;p&gt;To help me prioritize this, what&apos;s the use case here?  I believe suspending the process is simulating something that might happen in real life, but I would like to know specifics.&lt;/p&gt;</comment>
                            <comment id="361712" author="scotthernandez" created="Mon, 17 Jun 2013 13:03:03 +0000"  >&lt;p&gt;In a reproduction I found it went the other way every time.&lt;/p&gt;

&lt;p&gt;I&apos;m not sure what you are suggesting is a good behavior or even possible to determine conclusively.&lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                                        </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="27984" name="rollbacks-when-old-primary-comes-back-take2.log" size="67897" author="henrik.ingo@mongodb.com" created="Mon, 17 Jun 2013 18:01:56 +0000"/>
                            <attachment id="27973" name="rollbacks-when-old-primary-comes-back.log" size="68623" author="henrik.ingo@mongodb.com" created="Mon, 17 Jun 2013 12:55:37 +0000"/>
                            <attachment id="27974" name="rstest.sh" size="230" author="henrik.ingo@mongodb.com" created="Mon, 17 Jun 2013 12:55:37 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>5.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 17 Jun 2013 13:03:03 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        10 years, 10 weeks, 5 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>true</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_10056" key="com.atlassian.jira.toolkit:lastupdaterorcommenter">
                        <customfieldname>Last commenter</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            10 years, 10 weeks, 5 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10000" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Old_Backport</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10000"><![CDATA[No]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>eliot</customfieldvalue>
            <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>henrik.ingo@mongodb.com</customfieldvalue>
            <customfieldvalue>scotthernandez</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrmq1b:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hrr6sn:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>72462</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;1) A script is inserting one document per second. (The documents happen to contain output of unix &lt;b&gt;date&lt;/b&gt; command.) See rstest.sh.&lt;/p&gt;

&lt;p&gt;2) The primary node (27001 in the log) is running in foreground in the shell and is stopped with Ctrl-Z.&lt;/p&gt;

&lt;p&gt;3) Another node (27002) becomes primary as designed.&lt;/p&gt;

&lt;p&gt;3) More inserts are directed to the new primary (via mongos node).&lt;/p&gt;

&lt;p&gt;4) Inserts are stopped. No writes are happening anywhere.&lt;/p&gt;

&lt;p&gt;5) A few minutes later the original primary continues (type &lt;b&gt;fg&lt;/b&gt; in the shell). It now:&lt;/p&gt;

&lt;p&gt;6) proceeds to write to oplog one pending write that it was handling when stopped (this is probably not reproducible 100% of the time, but with some statistical certainty. You can remove the &lt;b&gt;sleep 1&lt;/b&gt; from &lt;b&gt;rstest.sh&lt;/b&gt; to increase probability.)&lt;/p&gt;

&lt;p&gt;7) the oplog write gets the timestamp of current time, not the time when the insert was sent from client and this node was still primary.&lt;/p&gt;

&lt;p&gt;8) the old primary sends heartbeats to other members claiming to be primary and also claiming to have most recent OpTime in its oplog.&lt;/p&gt;

&lt;p&gt;9) the new/current primary freaks out, yields, and rolls back all the inserts it had written in (3) above. Joins the old primary as a secondary.&lt;/p&gt;

&lt;p&gt;Note: in the syslog there are also repeated calls to FindCommonPoint for he next 2 minutes or so, complaining about a 10 seconds gap in oplogs. I didn&apos;t troubleshoot what that is about, but probably there may be another bug lurking somewhere?&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hs3iq7:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>