<!-- 
RSS generated by JIRA (9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66) at Thu Feb 08 03:46:09 UTC 2024

It is possible to restrict the fields that are returned in this document by specifying the 'field' parameter in your request.
For example, to request only the issue key and summary append 'field=key&field=summary' to the URL of your request.
-->
<rss version="0.92" >
<channel>
    <title>MongoDB Jira</title>
    <link>https://jira.mongodb.org</link>
    <description>This file is an XML representation of an issue</description>
    <language>en-us</language>    <build-info>
        <version>9.7.1</version>
        <build-number>970001</build-number>
        <build-date>13-04-2023</build-date>
    </build-info>


<item>
            <title>[SERVER-17975] Stale reads with WriteConcern Majority and ReadPreference Primary</title>
                <link>https://jira.mongodb.org/browse/SERVER-17975</link>
                <project id="10000" key="SERVER">Core Server</project>
                    <description>&lt;p&gt;Hello, everyone! Hope you&apos;re having a terrific week. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;I think I may have found a thing!&lt;/p&gt;

&lt;p&gt;In Jepsen tests involving a mix of reads, writes, and compare-and-set against a single document, MongoDB appears to allow stale reads, even when writes use WriteConcern.MAJORITY, when network partitions cause a leader election. This holds for both plain find-by-id lookups and for queries explicitly passing ReadPreference.primary().&lt;/p&gt;

&lt;p&gt;Here&apos;s how we execute read, write, and compare-and-set operations against a register:&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/aphyr/jepsen/blob/72697c09eff26fdb1afb7491256c873f03404307/mongodb/src/mongodb/document_cas.clj#L55-L81&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/aphyr/jepsen/blob/72697c09eff26fdb1afb7491256c873f03404307/mongodb/src/mongodb/document_cas.clj#L55-L81&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And this is the schedule for failures: a 60-second on, 60-second off pattern of network partitions cutting the network cleanly into a randomly selected 3-node majority component and a 2-node minority component.&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;https://github.com/aphyr/jepsen/blob/72697c09eff26fdb1afb7491256c873f03404307/mongodb/src/mongodb/core.clj#L377-L391&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/aphyr/jepsen/blob/72697c09eff26fdb1afb7491256c873f03404307/mongodb/src/mongodb/core.clj#L377-L391&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This particular test is a bit finicky--it&apos;s easy to get knossos locked into a really slow verification cycle, or to have trouble triggering the bug. Wish I had a more reliable test for you! &lt;/p&gt;

&lt;p&gt;Attached, linearizability.txt shows the linearizability analysis from Knossos for a test run with a relatively simple failure mode. In this test, MongoDB returns the value &quot;0&quot; for the document, even though the only possible values for the document at that time were 1, 2, 3, or 4. The value 0 was the proper state at some time close to the partition&apos;s beginning, but successful reads just after the partition was fully established indicated that at least one of the indeterminate (:info) CaS operations changing the value away from 0 had to have executed.&lt;/p&gt;

&lt;p&gt;You can see this visually in the attached image, where I&apos;ve drawn the acknowledged (:ok) operations as green and indeterminate (:info) operations as yellow bars; omitting :fail ops which are known to have not taken place. Time moves from left to right; each process is a numbered horizontal track. The value &lt;b&gt;must&lt;/b&gt; be zero just prior to the partition, but in order to read 4 and 3 we &lt;b&gt;must&lt;/b&gt; execute process 1&apos;s CAS from 0-&amp;gt;4; all possible paths from that point on cannot result in a value of 0 in time for process 5&apos;s final read.&lt;/p&gt;

&lt;p&gt;Since the MongoDB docs for Read Preferences (&lt;a href=&quot;http://docs.mongodb.org/manual/core/read-preference/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://docs.mongodb.org/manual/core/read-preference/&lt;/a&gt;) say &quot;reading from the primary guarantees that read operations reflect the latest version of a document&quot;, I suspect this behavior conflicts with Mongo&apos;s intended behavior.&lt;/p&gt;

&lt;p&gt;There is good news! If you &lt;b&gt;remove&lt;/b&gt; all read operations from the mix, performing only CaS and writes, single-register ops with WriteConcern MAJORITY &lt;b&gt;do&lt;/b&gt; appear to be linearizable! Or, at least, I haven&apos;t devised an aggressive enough test to expose any faults yet. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;This suggests to me that MongoDB might make the same mistake that Etcd and Consul did with respect to consistent reads: assuming that a node which believes it is currently a primary can safely service a read request without confirming with a quorum of secondaries that it is &lt;b&gt;still&lt;/b&gt; the primary. If this is so, you might refer to &lt;a href=&quot;https://github.com/coreos/etcd/issues/741&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://github.com/coreos/etcd/issues/741&lt;/a&gt; and &lt;a href=&quot;https://gist.github.com/armon/11059431&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://gist.github.com/armon/11059431&lt;/a&gt; for more context on why this behavior is not consistent.&lt;/p&gt;

&lt;p&gt;If this is the case, I think you can recover linearizable reads by computing the return value for the query, then verifying with a majority of nodes that no leadership transitions have happened since the start of the query, and &lt;b&gt;then&lt;/b&gt; sending the result back to the client--preventing a logically &quot;old&quot; primary from servicing reads.&lt;/p&gt;

&lt;p&gt;Let me know if there&apos;s anything else I can help with! &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</description>
                <environment></environment>
        <key id="195242">SERVER-17975</key>
            <summary>Stale reads with WriteConcern Majority and ReadPreference Primary</summary>
                <type id="1" iconUrl="https://jira.mongodb.org/secure/viewavatar?size=xsmall&amp;avatarId=14703&amp;avatarType=issuetype">Bug</type>
                                            <priority id="3" iconUrl="https://jira.mongodb.org/images/icons/priorities/major.svg">Major - P3</priority>
                        <status id="6" iconUrl="https://jira.mongodb.org/images/icons/statuses/closed.png" description="The issue is considered finished, the resolution is correct. Issues which are closed can be reopened.">Closed</status>
                    <statusCategory id="3" key="done" colorName="success"/>
                                    <resolution id="9">Done</resolution>
                                        <assignee username="schwerin@mongodb.com">Andy Schwerin</assignee>
                                    <reporter username="aphyr">Kyle Kingsbury</reporter>
                        <labels>
                    </labels>
                <created>Fri, 10 Apr 2015 06:43:32 +0000</created>
                <updated>Fri, 2 Apr 2021 20:24:30 +0000</updated>
                            <resolved>Mon, 7 Nov 2016 15:16:59 +0000</resolved>
                                    <version>2.6.7</version>
                                    <fixVersion>3.4.0-rc3</fixVersion>
                                    <component>Replication</component>
                                        <votes>15</votes>
                                    <watches>116</watches>
                                                                                                                <comments>
                            <comment id="1427911" author="schwerin" created="Mon, 7 Nov 2016 15:16:59 +0000"  >&lt;p&gt;We have completed implementation of a new &quot;linearizable&quot; read concern under &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18285&quot; title=&quot;Support linearizable reads on replica set primaries&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18285&quot;&gt;&lt;del&gt;SERVER-18285&lt;/del&gt;&lt;/a&gt;, and have undertaken some documentation updates under &lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-8298&quot; title=&quot;Linearizable Read&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DOCS-8298&quot;&gt;&lt;del&gt;DOCS-8298&lt;/del&gt;&lt;/a&gt;. As such, I&apos;m resolving this ticket as &quot;fixed&quot; for MongoDB 3.4.0-rc3. The code is actually present and enabled in 3.4.0-rc2, for those interested in further test. Our own testing included, among other things, integrating &lt;a href=&quot;http://jepsen.io/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;jepsen&lt;/a&gt; tests into our &lt;a href=&quot;https://evergreen.mongodb.com&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;continous integration system&lt;/a&gt;. That work was done in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-24509&quot; title=&quot;Jepsen tests for Linearizable reads&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-24509&quot;&gt;&lt;del&gt;SERVER-24509&lt;/del&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Thanks for your report and follow-up assistance, &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=aphyr&quot; class=&quot;user-hover&quot; rel=&quot;aphyr&quot;&gt;aphyr&lt;/a&gt;.&lt;/p&gt;</comment>
                            <comment id="1282869" author="schwerin" created="Thu, 2 Jun 2016 20:23:29 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=Marqin&quot; class=&quot;user-hover&quot; rel=&quot;Marqin&quot;&gt;Marqin&lt;/a&gt;, in the meantime, for single-document reads, if you have write privileges on the collection containing the document, you can use a findAndModify that performs a no-op update to avoid stale reads in cases where that is an operational requirement.&lt;/p&gt;

&lt;p&gt;This documentation &lt;a href=&quot;https://docs.mongodb.com/v3.2/tutorial/perform-findAndModify-quorum-reads/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;suggests one approach&lt;/a&gt;, though it&apos;s not necessary to do a write that actually changes the document.&lt;/p&gt;</comment>
                            <comment id="1282252" author="ramon.fernandez" created="Thu, 2 Jun 2016 13:13:53 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=Marqin&quot; class=&quot;user-hover&quot; rel=&quot;Marqin&quot;&gt;Marqin&lt;/a&gt;, the &quot;3.3 Desired&quot; fixVersion indicates that we&apos;re aiming to address this ticket in the current development cycle. Feel free to watch the ticket for updates.&lt;/p&gt;

&lt;p&gt;Regards,&lt;br/&gt;
Ram&#243;n.&lt;/p&gt;</comment>
                            <comment id="1282162" author="marqin" created="Thu, 2 Jun 2016 10:23:47 +0000"  >&lt;p&gt;What&apos;s the current state of this bug?&lt;/p&gt;</comment>
                            <comment id="947359" author="carstenklein@yahoo.de" created="Mon, 22 Jun 2015 20:07:18 +0000"  >&lt;p&gt;Andy Schwering, here, you definetely lost me &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/biggrin.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;What I meant was, that prior to reading or writing from the primary, there should be a third instance that would validate that primary before it is being used, even if it needed do multiple rpcs to the list of provable primaries and also wait for a specific amount of time before the data got replicated across all machines or at least to the one the client is being connected to.&lt;/p&gt;

&lt;p&gt;Ultimately causing the reading or writing client to fail if the primary could not be validated in a timely fashion.&lt;/p&gt;

&lt;p&gt;Which, I guess, is basically what the option is all about... lest for the failing part, of course.&lt;/p&gt;</comment>
                            <comment id="934074" author="schwerin" created="Mon, 8 Jun 2015 16:17:52 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=carstenklein%40yahoo.de&quot; class=&quot;user-hover&quot; rel=&quot;carstenklein@yahoo.de&quot;&gt;carstenklein@yahoo.de&lt;/a&gt;, if I understand Galera&apos;s model correctly, &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18285&quot; title=&quot;Support linearizable reads on replica set primaries&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18285&quot;&gt;&lt;del&gt;SERVER-18285&lt;/del&gt;&lt;/a&gt; should provide the equivalent behavior to setting &lt;a href=&quot;http://galeracluster.com/documentation-webpages/mysqlwsrepoptions.html#wsrep-sync-wait&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;wsrep_sync_wait&lt;/a&gt; = 1, while using it in conjunction with w:majority writes starting in MongoDB 3.2 ought to provide wsrep_sync_wait=3 or possibly 7.&lt;/p&gt;

&lt;p&gt;There are some differences, as individual replica sets in MongoDB only elect a single primary (write master) at a time, but I believe the effect is similar.&lt;/p&gt;</comment>
                            <comment id="932826" author="carstenklein@yahoo.de" created="Fri, 5 Jun 2015 17:23:07 +0000"  >&lt;p&gt;Hm, looking at MariaDB Galera, it uses both a proxy and an additional arbitrator for handling both fail over and for making sure that updates and presumably also reads are valid.&lt;br/&gt;
Would it not be possible to implement a similar scheme, lest the proxy of course, in MongoDB to get rid of this once and for all?&lt;/p&gt;

&lt;p&gt;As I see it, each mongo db replicate acts as an arbitrator. The same goes for MariaDB Galera, however, here, they also integrated an additional independent arbitrator that does not hold a replication set, just the transaction log.&lt;/p&gt;</comment>
                            <comment id="895515" author="schwerin" created="Wed, 22 Apr 2015 20:34:52 +0000"  >&lt;p&gt;You cannot implement this feature with timing tricks.  Even if everything else is going great, the OS scheduler can screw you pretty easily on a heavily loaded system, and just fail to schedule the step-down work on the old primary.  We see this in our test harnesses sometimes, in tests that wait for failover to complete.&lt;/p&gt;</comment>
                            <comment id="895475" author="aphyr" created="Wed, 22 Apr 2015 20:10:49 +0000"  >&lt;p&gt;&amp;gt; I&apos;m not sure that can ever happen. Even if it could happen, it would be easy to tweak the step-down and election sequences so that step-down is guaranteed to happen faster than election of a new primary.&lt;/p&gt;

&lt;p&gt;The network is not synchronous, clocks drift, nodes pause, etc. Fixing a race condition via a timeout is an easy workaround, but I think you&apos;ll find (like Consul) that it&apos;s a probabilistic hack at best.&lt;/p&gt;</comment>
                            <comment id="895469" author="henrik.ingo@10gen.com" created="Wed, 22 Apr 2015 20:08:01 +0000"  >&lt;p&gt;I was thinking about this today, and I&apos;m still wondering whether stale reads are at all possible in MongoDB? Even today with 2.6/3.0?&lt;/p&gt;

&lt;p&gt;The kind of stale read that Kyle describes can happen if there are 2 primaries existing at the same time: the old primary about to step down, and the newly elected primary. Even if it&apos;s unlikely, in theory a client process could flip flop between the primaries so that it reads: new primary, old primary, new primary. However, this can only happen if the old primary steps down later than the new primary is elected. (Using ReadPreference = PRIMARY is of course assumed here.) &lt;/p&gt;

&lt;p&gt;I&apos;m not sure that can ever happen. Even if it could happen, it would be easy to tweak the step-down and election sequences so that step-down is guaranteed to happen faster than election of a new primary.&lt;/p&gt;

&lt;p&gt;This would be a more performant and easier solution than using findAndModify+getLastError or any other solution depending on doing roundtrips via the oplog.&lt;/p&gt;

&lt;p&gt;(Note that there have of course been a couple bugs reported where a replica set had 2 primaries even for long times, but those were bugs, not part of the intended failover protocol.)&lt;/p&gt;</comment>
                            <comment id="894586" author="mdcallag" created="Wed, 22 Apr 2015 03:18:24 +0000"  >&lt;p&gt;&lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-5185&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/DOCS-5185&lt;/a&gt; is related to this. Commits can be visible on the master before a slave receives the oplog entry. Therefore visible commits can be rolled back regardless of the write concern. I think the manual and online training should be updated to explain that.&lt;/p&gt;

&lt;p&gt;I have also been hoping that eventually we will get something like lossless semisync replication in MongoDB as the Majority write concern is similar to semisync. Maybe this will serve as motivation.&lt;br/&gt;
&apos;&lt;/p&gt;</comment>
                            <comment id="894574" author="aphyr" created="Wed, 22 Apr 2015 02:39:26 +0000"  >&lt;p&gt;&amp;gt; in short I propose to break off the documentation request into a separate ticket and to use this ticket as the handle for scheduling the feature request.&lt;/p&gt;

&lt;p&gt;Sounds good to me! Aligning the docs to the current behavior can be done right away. I tried to make a reasonable survey of the Mongo consistency docs in the Jepsen post here: &lt;a href=&quot;https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;https://aphyr.com/posts/322-call-me-maybe-mongodb-stale-reads&lt;/a&gt;. The post also suggests some example anomalies that might be helpful for users trying to reason about whether they can tolerate dirty/stale reads.&lt;/p&gt;

&lt;p&gt;&amp;gt; &quot;coupling reads to oplog acknowledgement&quot; pretty much degrades to converting reads to read-modify-writes in periods of low write volume.&lt;/p&gt;

&lt;p&gt;The workaround I describe in the post is to just do a findAndModify from the current state to itself. Experiments suggest this will do the trick, but if Mongo&apos;s smart enough to optimize that CaS away this won&apos;t help, haha. I say &quot;couple&quot;, though, because you don&apos;t actually need to write anything to the oplog. You can actually piggyback the read state onto the existing oplog without inserting any new ops by simply blocking long enough for some &lt;b&gt;other&lt;/b&gt; operation to be replicated-thereby verifying the primary is still current. Or you can inject a heartbeat event every so often. Oh, and you can also batch reads ops which should improve performance as well. The Consul and Raft discussions I linked to talk about both tactics. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;&amp;gt; If &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; were resolved all the reads in the diagram would have returned 0, because none of the writes had committed.&lt;/p&gt;

&lt;p&gt;Ah, yeah, you&apos;re assuming all of these operations take place against the minority primary. That may be the case for this &lt;b&gt;particular&lt;/b&gt; history, but in general, writes can occur on &lt;b&gt;either&lt;/b&gt; side of the partition, leading to stale reads--the reads could see 0, then 1, then 0, then 1, or any other pattern, depending on which primary clients are talking and when they make their request.&lt;/p&gt;</comment>
                            <comment id="894554" author="mdcallag" created="Wed, 22 Apr 2015 01:34:06 +0000"  >&lt;p&gt;Still catching up on things so perhaps my comments are not relevant but...&lt;/p&gt;

&lt;p&gt;In 2.6 (mmapv1 only) changes on the master are visible:&lt;br/&gt;
1) before journal sync is done&lt;br/&gt;
2) before replicas might have received or ack&apos;d the change&lt;/p&gt;

&lt;p&gt;In 3.0 with WiredTiger and RocksDB changes on the master are not visible until after their redo log sync has been done. I assume that #1 continues to be true for mmapv1.&lt;/p&gt;

&lt;p&gt;In 3.0 I assume that #2 is still a problem.&lt;/p&gt;

&lt;p&gt;I wrote about this in:&lt;br/&gt;
&lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-2908&quot; class=&quot;external-link&quot; rel=&quot;nofollow&quot;&gt;https://jira.mongodb.org/browse/DOCS-2908&lt;/a&gt;&lt;br/&gt;
&lt;a href=&quot;http://smalldatum.blogspot.com/2014/03/when-does-mongodb-make-transaction.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://smalldatum.blogspot.com/2014/03/when-does-mongodb-make-transaction.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We also experienced this in MySQL land with semi-sync replication and solved the problem with lossless semisync replication. See the post by Yoshi for more details but the property we provide is that commits are not visible on the master until the commit log has been archived on at least one other replica or log-only replica.&lt;br/&gt;
&lt;a href=&quot;http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://yoshinorimatsunobu.blogspot.com/2014/04/semi-synchronous-replication-at-facebook.html&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It will take me a while to get through all of the details in this case but in the end I hope we can describe the MongoDB behavior in a few sentences.&lt;/p&gt;</comment>
                            <comment id="894547" author="schwerin" created="Wed, 22 Apr 2015 01:20:37 +0000"  >&lt;p&gt;Assigning to me for scheduling.&lt;/p&gt;</comment>
                            <comment id="894250" author="schwerin" created="Tue, 21 Apr 2015 19:43:33 +0000"  >&lt;p&gt;In my reading of this ticket, there are two actions requested.  One is to schedule the ticket&apos;s suggestion for a feature to support treating a single document as a linearizable concurrent object &lt;em&gt;without&lt;/em&gt; forcing the client to convert reads to read-modify-write operations.  The other is to correct the MongoDB documentation about read consistency, emphasizing the conditions in which stale reads may occur with read preference &quot;primary&quot; in current and prior versions of MongoDB.  Please read below for details, but in short I propose to break off the documentation request into a separate ticket and to use this ticket as the handle for scheduling the feature request.&lt;/p&gt;

&lt;p&gt;Regarding the possible linearizable schedule for the reads, let me try to clarify.  Your diagram indicates that by the end of the period in question, none of the writes have finished being confirmed by the replication system.  If &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; were resolved all the reads in the diagram would have returned 0, because none of the writes had committed.  As such, there would be a legal linearizable schedule, whose prefix includes the completion of all the read operations, and whose suffix includes the completion of all the write operations.  I think in this case that the fact that the writes had in fact started is not relevant.  In this case, write operation completion means replication to a majority of voting nodes and confirmation of that fact to the primary that accepted the write from the client.&lt;/p&gt;

&lt;p&gt;Anyhow, the behavior you &lt;em&gt;did&lt;/em&gt; observe certainly doesn&apos;t have a linearizable schedule.  As you point out, even with &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; you don&apos;t get reads into a linear schedule for free.  The problem is that there is a period of time during a network partition when two nodes may believe themselves to be primary.  As soon as those two nodes communicate, one will step down. If the partition lasts long enough, the node in the minority partition will step down, but there is an inevitable window for stale reads when a client reads from a primary that will inevitably step down.  As an aside, improvements to the consensus protocol can be used to bring that period down to a few network roundtrip periods (hundreds of milliseconds), and that is the subject of the somewhat ill-described &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-12385&quot; title=&quot;election algorithm modifications&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-12385&quot;&gt;&lt;del&gt;SERVER-12385&lt;/del&gt;&lt;/a&gt;. &lt;/p&gt;

&lt;p&gt;You suggested (approximately) transforming reads into atomic read-modify-writes in order to achieve linearizable reads.  You didn&apos;t propose it exactly that way, and your description leaves more room for optimization, but &quot;coupling reads to oplog acknowledgement&quot; pretty much degrades to converting reads to read-modify-writes in periods of low write volume.  The behavior can be achieved today, albeit somewhat clumsily and only with some client drivers, by using the &quot;findAndModify&quot; command to issue your reads and then issuing a getLastError command to wait for write concern satisfaction.  Your findAndModify command will need to make some change to the document being read, such as incrementing an otherwise ignored field, in order to force an entry into the oplog, and you cannot observe the value until the getLastError command returns successfully, indicating that your read-modify-write replicated successfully.&lt;/p&gt;

&lt;p&gt;Finally, as you indicated above, there is a clear documentation issue.  The documentation you reference needs to be updated.  As mentioned, there&apos;s an active DOCS ticket for part of that, &lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-5141&quot; title=&quot;can you document more behavior for read isolation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DOCS-5141&quot;&gt;&lt;del&gt;DOCS-5141&lt;/del&gt;&lt;/a&gt;, and I&apos;m now convinced that we&apos;ll need a separate one to review all of our read-preference documentation.  Improving the documentation will be tricky because, while linearizable distributed objects are often convenient, they come at a comparatively high cost in terms of communication overhead.  Since users&apos; needs can be frequently satisfied with more relaxed consistency models, the updated documentation will need to help developers weigh the probability of a stale read with its impact on their application.&lt;/p&gt;</comment>
                            <comment id="892980" author="aphyr" created="Mon, 20 Apr 2015 17:20:39 +0000"  >&lt;p&gt;In what possible sense is this &quot;working as designed&quot;? The MongoDB documentation repeats the terms &quot;immediate consistency&quot; and &quot;latest version&quot; over and over again.&lt;/p&gt;

&lt;p&gt;Here&apos;s the MongoDB chief architect claiming Mongo provides &quot;Immediate Consistency&quot; in a 2012 talk: &lt;a href=&quot;http://www.slideshare.net/mongodb/mongodb-basic-concepts-15674838&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.slideshare.net/mongodb/mongodb-basic-concepts-15674838&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here&apos;s the Read preference documentation claiming Mongo ReadPreference=primary &quot;guarantees that read operations reflect the latest version of a document&quot;: &lt;a href=&quot;http://docs.mongodb.org/manual/core/read-preference/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://docs.mongodb.org/manual/core/read-preference/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The MongoDB FAQ says &quot;MongoDB is consistent by default: reads and writes are issued to the primary member of a replica set&quot;: &lt;a href=&quot;http://www.mongodb.com/faq#consistency&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://www.mongodb.com/faq#consistency&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And the Architecture Guide repeats the theme that only non-primary ReadPreferences can see stale data: &lt;a href=&quot;http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://s3.amazonaws.com/info-mongodb-com/MongoDB_Architecture_Guide.pdf&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What Mongo &lt;b&gt;actually&lt;/b&gt; does is allow stale reads: it is possible to execute a WriteConcern=MAJORITY write of a new value, wait for it to return successfully, perform a read with ReadPreference=PRIMARY, and not see the value you just wrote.&lt;/p&gt;</comment>
                            <comment id="892970" author="aphyr" created="Mon, 20 Apr 2015 17:11:18 +0000"  >&lt;p&gt;To elaborate...&lt;/p&gt;

&lt;p&gt;&amp;gt; Further, there&apos;s a linearizable schedule in that case, I believe. It&apos;s been a while since I read Herlihy&apos;s paper, but if I have this right, with &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; and committed single-document reads, a legal schedule would have been to process all of the reads in some sequence, and then the writes of threads 0, 3, 4, 1 and 2&lt;/p&gt;

&lt;p&gt;I don&apos;t understand what you mean--it doesn&apos;t make sense for a register to read 0, 4, 3, and 0 again without any writes taking place.&lt;/p&gt;

&lt;p&gt;&amp;gt; On the other hand, even if an application does that the response might be delayed during transport, during which time a more-current value might appear. Sticking to the single-document case, for the moment, if a thread communicates with other threads only through MongoDB, so long as it never sees an older value of a document after seeing a newer value of a document, and so long as it does only committed reads, what would staleness even mean?&lt;/p&gt;

&lt;p&gt;The property you&apos;re describing is sequential consistency: all processes see operations in the same order, but do not agree on when they happen. Sequentially consistent systems allow arbitrarily stale reads: it is legal, for instance, for a new process to see &lt;b&gt;no&lt;/b&gt; documents, which leads to confusing anomalies like, say, submitting a comment, refreshing the page, and seeing nothing there. I think you would be hard-pressed to find users who have no side-channels between processes, and I also think most of your user base would interpret &quot;latest version&quot; to mean &quot;a state between the invocation and completion times of my read operation&quot;, not &quot;some state logically subsequent to my previous operation and temporally prior to the completion of my read operation.&quot;&lt;/p&gt;

&lt;p&gt;&amp;gt; Now, if the threads communicate directly with each other&lt;/p&gt;

&lt;p&gt;They can and they will communicate--I have never talked to a Mongo user which did not send data from MongoDB to a human being. This is why linearizability is a useful invariant: you know that if you post a status update, receive an HTTP 200 response, call up your friend, and ask them to look, they&apos;ll see your post.&lt;/p&gt;

&lt;p&gt;You can ask people to embed causality tokens in all their operations, but a.) you have to train users how to propagate and merge causality tokens correctly, b.) this does nothing for fresh processes, and c.) this is not what most people mean when they say &quot;immediate&quot; or &quot;latest version&quot;, haha. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;</comment>
                            <comment id="892967" author="milkie" created="Mon, 20 Apr 2015 17:09:18 +0000"  >&lt;p&gt;&lt;b&gt;EDIT&lt;/b&gt; This ticket was re-opened on April 21.&lt;br/&gt;
~~~~~&lt;br/&gt;
Kyle, I&apos;m going to switch this to &quot;Works as Designed&quot;, as you&apos;re correct that there are many more facets to this topic than just a simple duplication of one work ticket.  &lt;br/&gt;
I&apos;m still uncertain what you mean by &quot;staleness&quot; in this context, as highlighted by Andy in his response above.&lt;/p&gt;</comment>
                            <comment id="892921" author="aphyr" created="Mon, 20 Apr 2015 16:35:37 +0000"  >&lt;p&gt;Maybe I should have been more explicit: this is not a duplicate of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt;. Read-committed does not prevent stale reads.&lt;/p&gt;</comment>
                            <comment id="892620" author="milkie" created="Mon, 20 Apr 2015 12:41:44 +0000"  >&lt;p&gt;I&apos;m closing this as a duplicate of the read-committed ticket, but please feel free to reopen for further discussion.&lt;/p&gt;</comment>
                            <comment id="881786" author="schwerin" created="Wed, 15 Apr 2015 15:27:08 +0000"  >&lt;p&gt;From my interpretation of Kyle&apos;s diagram, if &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; were resolved and the test threads were doing single-document committed reads, thread 1 would not have observed 4 then 3, but 0 and then 0 again, since neither thread 0 nor thread 2 have completed their writes. Similarly, thread 5 would continue to read 0.  Those values aren&apos;t stale &amp;#8211; they would represent the most recent committed value.  Further, there&apos;s a linearizable schedule in that case, I believe.  It&apos;s been a while since I read Herlihy&apos;s paper, but if I have this right, with &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; and committed single-document reads, a legal schedule would have been to process all of the reads in some sequence, and then the writes of threads 0, 3, 4, 1 and 2 in that order.  Note, unlike &lt;a href=&quot;https://jira.mongodb.org/secure/ViewProfile.jspa?name=asya&quot; class=&quot;user-hover&quot; rel=&quot;asya&quot;&gt;asya&lt;/a&gt;,I primarily consulted the diagram, ticket description and prior comments.&lt;/p&gt;

&lt;p&gt;As for Kyle&apos;s point about needing reads to be coupled to oplog acknowledgement to prevent stale reads, I&apos;m of two minds.  On the one hand, an application can convert reads into atomic read-modify-write operations today using the findAndModify and getLastError commands in MongoDB in order to tie the reads into the oplog acknowledgement system (NB: I don&apos;t think most drivers support this today).  On the other hand, even if an application does that the response might be delayed during transport, during which time a more-current value might appear.  Sticking to the single-document case, for the moment, if a thread communicates with other threads only through MongoDB, so long as it never sees an older value of a document after seeing a newer value of a document, and so long as it does only committed reads, what would staleness even mean?&lt;/p&gt;

&lt;p&gt;Now, if the threads communicate directly with each other, the story gets more complicated and &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; may not be sufficient by itself. In that case, allowing the client threads to pass some kind of logical clock token when they communicate with each other would suffice to prevent a causal ordering violation during periods when one node erroneously believes itself to still be primary. That token could be a combination of the monotonically increasing election term id and the highest committed oplog timestamp on the node when the read completed.  If the causally second observer saw an earlier (term, optime) pair than the causally first observer, it would know to reject the read.&lt;/p&gt;

&lt;p&gt;That solution depends on resolution of &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-12385&quot; title=&quot;election algorithm modifications&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-12385&quot;&gt;&lt;del&gt;SERVER-12385&lt;/del&gt;&lt;/a&gt; (adding term ids to the oplog is part of that work) and &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-16570&quot; title=&quot;writeConcerns could be erroneously satisfied after a rollback occurs&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-16570&quot;&gt;&lt;del&gt;SERVER-16570&lt;/del&gt;&lt;/a&gt; (involving write concern satisfaction after rollback), which we&apos;re planning to do during the development of version 3.2.  It is worth noting that even that solution is insufficient for managing some causal relationships when communication is through multiple documents.  I don&apos;t believe we make promises about those causal relationships, today.&lt;/p&gt;

&lt;p&gt;In the meantime, we will work to improve the documentation around this behavior in current versions of MongoDB. As always, please respond if you have questions or comments.&lt;/p&gt;</comment>
                            <comment id="881375" author="aphyr" created="Wed, 15 Apr 2015 00:53:27 +0000"  >&lt;p&gt;(perhaps I should also mention, in case anyone comes along and thinks this is subsumed by &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt;, that fixing &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; does not necessarily resolve the problem of stale reads)&lt;/p&gt;</comment>
                            <comment id="881314" author="aphyr" created="Tue, 14 Apr 2015 23:18:55 +0000"  >&lt;p&gt;The existence of this behavior actually implies &lt;b&gt;both&lt;/b&gt; anomalies are present in MongoDB, but I&apos;m phrasing it conservatively. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/wink.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;Why? A dirty read from an isolated primary can be trivially converted to a stale read if the write to the isolated primary doesn&apos;t affect the outcome of the read (or if the write doesn&apos;t take place at all). I think there are two problems to fix here--supporting read-committed isolation will prevent dirty reads, but still allows stale reads. You &lt;b&gt;also&lt;/b&gt; have to couple reads to oplog acknowledgement in some way to prevent stale read transactions.&lt;/p&gt;

&lt;p&gt;I&apos;ve attached a sketch (journal-84.png) to illustrate--all you have to do is execute the write on the new primary instead of the old to convert a dirty read to a stale one. Either way, you&apos;re not reading &quot;the most recent state.&quot;&lt;/p&gt;

&lt;p&gt;Note that you don&apos;t &lt;b&gt;have&lt;/b&gt; to go full read-committed to fix this anomaly: you can prevent stale and dirty reads for single documents without supporting RC for multi-doc operations (just a difference in lock granularity), so if you want to support reading the latest version, you can have it in both read-uncommitted and read-committed modes. &lt;img class=&quot;emoticon&quot; src=&quot;https://jira.mongodb.org/images/icons/emoticons/smile.png&quot; height=&quot;16&quot; width=&quot;16&quot; align=&quot;absmiddle&quot; alt=&quot;&quot; border=&quot;0&quot;/&gt;&lt;/p&gt;

&lt;p&gt;The read isolation docs (&lt;a href=&quot;http://docs.mongodb.org/manual/core/write-concern/#read-isolation&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://docs.mongodb.org/manual/core/write-concern/#read-isolation&lt;/a&gt;) are technically correct, I think, but sorta misleading: &quot;For all inserts and updates, MongoDB modifies each document in isolation: clients never see documents in intermediate states&quot; kinda suggests that the read uncommitted problem refers to multiple-document updates&#8212;which is also true&#8212;but it doesn&apos;t mention that even read operations on a single document may see invalid states that are not causally connected to the final history.&lt;/p&gt;

&lt;p&gt;The read preference docs (&lt;a href=&quot;http://docs.mongodb.org/manual/core/read-preference/&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;http://docs.mongodb.org/manual/core/read-preference/&lt;/a&gt;) make some pretty explicit claims that Mongo supports linearizable reads, saying &quot;Reading from the primary guarantees that read operations reflect the latest version of a document&quot;, and &quot;All read preference modes except primary may return stale data&quot;.&lt;/p&gt;

&lt;p&gt;With this in mind, it might be a good idea to let users know all read modes may return stale data, and that the difference in ReadPreference just changes the probabilities. For instance, &quot;Ensure that your application can tolerate stale data if you choose to use a non-primary mode,&quot; could read &quot;Always ensure that your application can tolerate stale data.&quot;&lt;/p&gt;</comment>
                            <comment id="881311" author="aphyr" created="Tue, 14 Apr 2015 23:15:23 +0000"  >&lt;p&gt;Sketch illustrating that stale reads are a degenerate case of dirty reads.&lt;/p&gt;</comment>
                            <comment id="880337" author="asya" created="Tue, 14 Apr 2015 01:50:03 +0000"  >&lt;p&gt;Hello  Kyle,&lt;/p&gt;

&lt;p&gt;As you and Knossos have discovered, it is not possible to do fully linearized single-document reads with the current version of MongoDB.&lt;/p&gt;

&lt;p&gt;I believe that what your test framework is not taking into account is that reading from a primary does not guarantee that the read data will survive a network partition.   This is because MongoDB read isolation semantics are similar to &quot;read uncommitted&quot; in a traditional database system when you take into account the full replica set.&lt;/p&gt;

&lt;p&gt;As the &lt;a href=&quot;http://docs.mongodb.org/manual/core/write-concern/#read-isolation&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;docs&lt;/a&gt; mention, data written with majority writeConcern &lt;em&gt;that has been acknowledged&lt;/em&gt; will survive any replica set event that allows a new primary to be elected. However, after the write is made on the primary, but before it has successfully replicated to majority of the cluster, it is visible to any other connection that&apos;s reading from the primary.    &lt;/p&gt;

&lt;p&gt;This allows the following sequence of events:&lt;/p&gt;

&lt;div class=&apos;table-wrap&apos;&gt;
&lt;table class=&apos;confluenceTable&apos;&gt;&lt;tbody&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; T1 &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; network partition happens &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; T2 &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; write A happens, waits for write concern &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; T3 &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; read of A happens on the primary &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; T4 &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; primary steps down due to not seeing majority &lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;th class=&apos;confluenceTh&apos;&gt; T5 &lt;/th&gt;
&lt;td class=&apos;confluenceTd&apos;&gt; new primary is elected &lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;&lt;/table&gt;
&lt;/div&gt;


&lt;p&gt;When write A has not propagated to the majority of the replica set, it may not be present on the newly elected primary (in fact, if write A has replicated to none of the secondaries, it is guaranteed to be absent from the newly elected primary).&lt;/p&gt;

&lt;p&gt;I believe such a sequence of events was observed in your case, where the majority write concern is not yet satisfied, the unacknowledged data have been written on the primary and were visible to other connections (process 1 in your case), but the value was not present on the newly elected primary (which is the node that process 5 finally successfully read from).  The phenomenon your tests are observing are not stale reads (of value 0) but rather uncommitted reads, and those are the reads &quot;1 read 4&quot; and &quot;1 read 3&quot; as this happens on the &quot;old&quot; primary. Those writes were not acknowledged, nor replicated to the majority partition, and &lt;a href=&quot;http://docs.mongodb.org/manual/core/replica-set-rollbacks/#avoid-replica-set-rollbacks&quot; class=&quot;external-link&quot; target=&quot;_blank&quot; rel=&quot;nofollow noopener&quot;&gt;they will be (correctly) rolled back&lt;/a&gt; when the partition is removed.&lt;/p&gt;

&lt;p&gt;Currently, there is a Documentation task, &lt;a href=&quot;https://jira.mongodb.org/browse/DOCS-5141&quot; title=&quot;can you document more behavior for read isolation&quot; class=&quot;issue-link&quot; data-issue-key=&quot;DOCS-5141&quot;&gt;&lt;del&gt;DOCS-5141&lt;/del&gt;&lt;/a&gt;, to clarify read isolation semantics further.   In addition, in &lt;a href=&quot;https://jira.mongodb.org/browse/SERVER-18022&quot; title=&quot;Support &amp;quot;read committed&amp;quot; isolation level where &amp;quot;committed&amp;quot; means confirmed by the voting majority of a replica set&quot; class=&quot;issue-link&quot; data-issue-key=&quot;SERVER-18022&quot;&gt;&lt;del&gt;SERVER-18022&lt;/del&gt;&lt;/a&gt; we are working on support for read-committed isolation, which will enable your test to perform linearizable reads correctly &amp;#8211; I&apos;ll mark this ticket as a duplicate of that work so they will be linked together in JIRA.&lt;/p&gt;

&lt;p&gt;Thanks for detailed report and let me know if you have any questions.&lt;br/&gt;
Asya &lt;/p&gt;</comment>
                    </comments>
                <issuelinks>
                            <issuelinktype id="10011">
                    <name>Depends</name>
                                            <outwardlinks description="depends on">
                                        <issuelink>
            <issuekey id="201397">SERVER-18285</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                            <issuelinktype id="10012">
                    <name>Related</name>
                                            <outwardlinks description="related to">
                                        <issuelink>
            <issuekey id="193152">DOCS-5141</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="199572">DOCS-5259</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="201360">DOCS-5324</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="201361">DOCS-5325</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="203400">CXX-597</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="201363">DOCS-5326</issuekey>
        </issuelink>
            <issuelink>
            <issuekey id="202279">DRIVERS-228</issuekey>
        </issuelink>
                            </outwardlinks>
                                                        </issuelinktype>
                    </issuelinks>
                <attachments>
                            <attachment id="70330" name="CCNSOQ6UwAEAvsO.jpg" size="25918" author="aphyr" created="Fri, 10 Apr 2015 06:43:32 +0000"/>
                            <attachment id="70845" name="Journal - 84.png" size="782659" author="aphyr" created="Tue, 14 Apr 2015 23:15:23 +0000"/>
                            <attachment id="70332" name="history.edn" size="313964" author="aphyr" created="Fri, 10 Apr 2015 06:43:32 +0000"/>
                            <attachment id="70331" name="linearizability.txt" size="49370" author="aphyr" created="Fri, 10 Apr 2015 06:43:32 +0000"/>
                    </attachments>
                <subtasks>
                    </subtasks>
                <customfields>
                                                <customfield id="customfield_10050" key="com.atlassian.jira.toolkit:comments">
                        <customfieldname># Replies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>25.0</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_10011" key="com.atlassian.jira.plugin.system.customfieldtypes:radiobuttons">
                        <customfieldname>Backwards Compatibility</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10011"><![CDATA[Minor Change]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_13552" key="com.go2group.jira.plugin.crm:crm_generic_field">
                        <customfieldname>Case</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[[5002K00000uDNNsQAO]]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                            <customfield id="customfield_10055" key="com.atlassian.jira.ext.charting:firstresponsedate">
                        <customfieldname>Date of 1st Reply</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 13 Apr 2015 17:22:24 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10052" key="com.atlassian.jira.toolkit:dayslastcommented">
                        <customfieldname>Days since reply</customfieldname>
                        <customfieldvalues>
                                        7 years, 14 weeks, 2 days ago
    
                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_18254" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Dependencies</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue><![CDATA[<s><a href='https://jira.mongodb.org/browse/SERVER-18285'>SERVER-18285</a></s>]]></customfieldvalue>


                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_15850" key="com.atlassian.jira.plugins.jira-development-integration-plugin:devsummary">
                        <customfieldname>Development</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_14262" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>End date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Mon, 7 Nov 2016 23:59:59 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_10857" key="com.pyxis.greenhopper.jira:gh-epic-link">
                        <customfieldname>Epic Link</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>PM-151</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                <customfield id="customfield_10057" key="com.atlassian.jira.toolkit:lastusercommented">
                        <customfieldname>Last comment by Customer</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>false</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                                                        <customfield id="customfield_11151" key="com.atlassian.jira.toolkit:LastCommentDate">
                        <customfieldname>Last public comment date</customfieldname>
                        <customfieldvalues>
                            7 years, 14 weeks, 2 days ago
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                    <customfield id="customfield_10032" key="com.atlassian.jira.plugin.system.customfieldtypes:select">
                        <customfieldname>Operating System</customfieldname>
                        <customfieldvalues>
                                <customfieldvalue key="10026"><![CDATA[ALL]]></customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_10051" key="com.atlassian.jira.toolkit:participants">
                        <customfieldname>Participants</customfieldname>
                        <customfieldvalues>
                                        <customfieldvalue>schwerin@mongodb.com</customfieldvalue>
            <customfieldvalue>asya.kamsky@mongodb.com</customfieldvalue>
            <customfieldvalue>carstenklein@yahoo.de</customfieldvalue>
            <customfieldvalue>milkie@mongodb.com</customfieldvalue>
            <customfieldvalue>henrik.ingo@mongodb.com</customfieldvalue>
            <customfieldvalue>Marqin</customfieldvalue>
            <customfieldvalue>aphyr</customfieldvalue>
            <customfieldvalue>mdcallag</customfieldvalue>
            <customfieldvalue>ramon.fernandez@mongodb.com</customfieldvalue>
    
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                        <customfield id="customfield_14254" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Product Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrl8nr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                <customfield id="customfield_12550" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>Rank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>2|hre2lj:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10558" key="com.pyxis.greenhopper.jira:gh-global-rank">
                        <customfieldname>Rank (Obsolete)</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>9223372036854775807</customfieldvalue>
                        </customfieldvalues>
                    </customfield>
                                                                                            <customfield id="customfield_23361" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Requested By</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                <customfield id="customfield_14261" key="com.atlassian.jira.plugin.system.customfieldtypes:datepicker">
                        <customfieldname>Start date</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>Fri, 10 Apr 2015 00:00:00 +0000</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                <customfield id="customfield_10750" key="com.atlassian.jira.plugin.system.customfieldtypes:textarea">
                        <customfieldname>Steps To Reproduce</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>&lt;p&gt;Clone jepsen, check out commit 72697c09eff26fdb1afb7491256c873f03404307, cd mongodb, and run `lein test`. Might need to run `lein install` in the jepsen/jepsen directory first.&lt;/p&gt;</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                    <customfield id="customfield_10053" key="com.atlassian.jira.ext.charting:timeinstatus">
                        <customfieldname>Time In Status</customfieldname>
                        <customfieldvalues>
                            
                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                                                                                                        <customfield id="customfield_22870" key="com.onresolve.jira.groovy.groovyrunner:scripted-field">
                        <customfieldname>Triagers</customfieldname>
                        <customfieldvalues>
                                

                        </customfieldvalues>
                    </customfield>
                                                                                                                                                                                                                                                                                                                                                                                    <customfield id="customfield_14350" key="com.pyxis.greenhopper.jira:gh-lexo-rank">
                        <customfieldname>serverRank</customfieldname>
                        <customfieldvalues>
                            <customfieldvalue>1|hrq4zr:</customfieldvalue>

                        </customfieldvalues>
                    </customfield>
                                    </customfields>
    </item>
</channel>
</rss>