[SERVER-41769] The committedSnapshot should not be greater than allCommitted when EMRC=false Created: 14/Jun/19  Updated: 29/Oct/23  Resolved: 16/Jul/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: 4.3.1
Fix Version/s: 4.2.0-rc3, 4.3.1

Type: Bug Priority: Major - P3
Reporter: Jason Chan Assignee: Jason Chan
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Duplicate
duplicates SERVER-42225 timestamped_reads_wait_for_prepare_op... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.2
Sprint: Repl 2019-07-01, Repl 2019-07-15, Repl 2019-07-29
Participants:
Linked BF Score: 20

 Description   

Currently we start a transaction that performs modifications to documents with snapshot read concern.

With snapshot read concern, we set the readSource to be the kAllCommittedSnapshot. This can cause build failures in the case where in our evergreen runs, an HMAC key insertion right before we prepopulate our test collection. In this case, since we prepopulate the collection while the WT transaction for the HMAC key insertion is not yet committed, the documents we insert into the test collection never make it into the all committed snapshot. This will cause inconsistencies in our test results. We should pin the committedSnapshot to always be <= allCommitted and also audit all uses of transactions with snapshot read concern to make sure any previous writes outside the transaction that are depended on by the transaction should be using majority write concern.



 Comments   
Comment by Githook User [ 17/Jul/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com'}

Message: SERVER-41769 audit uses of write concern majority before transactions with snapshot read concern

(cherry picked from commit 5d376f806460dee5a121483e33c0eeeb8c3791d7)
Branch: v4.2
https://github.com/mongodb/mongo/commit/eba3ad3f7263ff98293f312403e373ebc9311939

Comment by Githook User [ 17/Jul/19 ]

Author:

{'name': 'Jason Chan', 'email': 'jason.chan@10gen.com', 'username': 'jasonjhchan'}

Message: SERVER-41769 Pin _currentCommittedSnapshot to be <= the allCommitted timestamp when EMRC=false

(cherry picked from commit 795b99ef36ea9448c862db6fd56ff6321bf66c9e)
Branch: v4.2
https://github.com/mongodb/mongo/commit/f659c78d8d07a69a6b935a75af47722dc780bcb9

Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com'}

Message: SERVER-41769 audit uses of write concern majority before transactions with snapshot read concern
Branch: master
https://github.com/mongodb/mongo/commit/5d376f806460dee5a121483e33c0eeeb8c3791d7

Comment by Githook User [ 16/Jul/19 ]

Author:

{'name': 'Jason Chan', 'username': 'jasonjhchan', 'email': 'jason.chan@10gen.com'}

Message: SERVER-41769 Pin _currentCommittedSnapshot to be <= the allCommitted timestamp when EMRC=false
Branch: master
https://github.com/mongodb/mongo/commit/795b99ef36ea9448c862db6fd56ff6321bf66c9e

Comment by Jason Chan [ 27/Jun/19 ]

Yes, I've changed the ticket title and description. I will audit the usages of transactions with snapshot read concern to make sure any writes that are depended on by the transaction will be using majority write concern.

Comment by Judah Schvimer [ 27/Jun/19 ]

That makes sense. Should we also check that we are using w:majority everywhere we need to be?

Comment by Jason Chan [ 27/Jun/19 ]

judah.schvimer
I think william.schultz's suggestion to pin the committed snapshot to be <= allCommitted when EMRC=false makes the most sense. This will fix the difference in behaviors between single-node and multi-node replica sets when EMRC false and using snapshot read concern. This will also fix any currently flakey tests and make true the assumption that transactions should always be able to read previously majority committed writes.

Some alternative solutions that I didn't think were as good:
1. We currently have a [uses_snapshot_read_concern] jstest tag that we can blacklist from the enable majority read concern false build variants. However, this doesn't fix the difference in behaviors of single-node vs. multiple-node replica sets when EMRC=false.
2. We can ban EMRC=false entirely from single-node replica sets but this is a larger change that seems a bit much for a bug that we think will affect a very small portion of our customers.

Comment by Judah Schvimer [ 27/Jun/19 ]

jason.chan, what do you propose to do to fix this ticket?

Comment by Jason Chan [ 27/Jun/19 ]

I created a quick repro of the bug here (should only be run when enableMajorityReadConcern is false). The issue seems to be as described. We set the committed snapshot to be the lastCommittedOpTime. The lastCommittedOpTime gets set to the durableOpTime which ends up being the same as the lastApplied. This allows the insert with majority write concern to complete successfully but the transaction will continue to read at the allCommittedTimestamp with snapshot read concern.

Comment by William Schultz (Inactive) [ 27/Jun/19 ]

Yes, forcing all tests that want RYOW behavior to contain their ops within a single session might be additionally tricky to enforce across our base of tests.

Comment by Judah Schvimer [ 27/Jun/19 ]

Causal consistency would not be sufficient since many tests expect sessions to read writes done outside of their session. Causal is a session property.

Comment by William Schultz (Inactive) [ 27/Jun/19 ]

Also, to Judah's above point, using causal consistency in our tests certainly would be a valid alternative to guarantee read-your-writes behavior for transactions, but we feared that this would be harder for people to remember when writing new tests. Arguably, any test that truly wants read-your-writes behavior should be using causal consistency, but we don't strictly adhere to that in practice.

Comment by William Schultz (Inactive) [ 27/Jun/19 ]

jason.chan is going to investigate this a bit more to come up with a concrete repro of the theory above, but after talking with him a bit more we came to a few conclusions on this issue:

  1. If the above theory is correct, there is a behavioral divergence between 1 node replica sets and multi-node replica sets when EMRC=false, which is that w:majority writes may not be visible to subsequent "snapshot" read concern transactions. I am not convinced we should be very worried about this divergence from a user perspective since I presume that the number of real world deployments using transactions on 1 node replica sets with EMRC=false is quite small.
  2. Even if we are not worried about this behavior from a user perspective, it may be worth doing something about it for the sake of our test infrastructure. We predict that the inability to read previous majority committed writes in a transaction could cause confusion and more flaky test failures in the future. If, in the server, we constrain the committed snapshot to be <= allCommitted when EMRC=false, that should create behavioral parity between 1 and multi-node replica sets when EMRC=false.
  3. As Jason and I talked about this issue, we recalled that we only care about providing the EMRC=false option when a replica set can contain an arbiter, for the reasons outlined in PM-1191. A 1 node replica set can never have an arbiter, and so we raised the possibility of banning EMRC=false entirely on 1 node replica sets. This would be a more significant change, but is another alternative we considered.
Comment by William Schultz (Inactive) [ 24/Jun/19 ]

There may be an issue here that is specific to enableMajorityReadConcern=false. When EMRC=false, we set the committed snapshot to the lastCommittedOpTime, instead of the stable timestamp. This would mean that, on 1 node replica sets, it is not restricted to being behind the allCommitted, and so if we advance the commit point immediately to lastApplied on 1 node replica sets, then a w:majority write would not wait for the allCommitted to move beyond the optime of the write. With EMRC=true, though, if the committed snapshot is restrained to be behind the allCommitted, then waiting for majority write concern would also wait for the allCommitted to reach the optime of the write. Presumably we could write a simple test to demonstrate this behavior. If that is the case, then we may care more about making the semantics of transactions consistent between both settings of the enableMajorityReadConcern parameter.

Comment by Judah Schvimer [ 21/Jun/19 ]

We currently have a bit of a problem that our transaction testing of different read concerns is fairly arbitrary. I don't want to accidentally lose snapshot read concern coverage by removing all occurrences of it. Using causal consistency on the given session is a more canonical way of getting read your writes.

Comment by Jason Chan [ 20/Jun/19 ]

The same issue happened in BF-13486 with a different jstest. We should look into a solution to the general problem rather than fixing these tests on a case-by-case basis.

Generated at Thu Feb 08 04:58:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.