[SERVER-63971] Change server parameter to default to read-your-writes behavior after 2PC transaction Created: 24/Feb/22  Updated: 29/Oct/23  Resolved: 01/Jun/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 5.0.10, 6.0.0-rc9, 6.1.0-rc0

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Randolph Tan
Resolution: Fixed Votes: 0
Labels: sharding-nyc-subteam1
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Documented
is documented by DOCS-15386 [Server] Change server parameter to d... Closed
Problem/Incident
Related
related to SERVER-60947 concurrency_sharded_replication_multi... Closed
related to SERVER-64515 Remove claims of prepare behavior fro... Open
is related to SERVER-63815 Add section on probable-read-you-writ... Backlog
is related to SERVER-37364 Coordinator should return the decisio... Closed
is related to SERVER-47130 Reduce w:majority commits from 4 to 2... Backlog
Backwards Compatibility: Fully Compatible
Backport Requested:
v6.0, v5.0
Sprint: Sharding NYC 2022-05-30, Sharding NYC 2022-06-13
Participants:
Case:
Linked BF Score: 42
Story Points: 3

 Description   

Currently, the coordinateCommitReturnImmediatelyAfterPersistingDecision server parameter added in SERVER-37364 defaults to true, meaning a user may not get read-your-writes behavior even when using readPreference primary in the following cases:

  • User does a 2PC transaction (with a write, and within a session)
  • User tries to read the write it just did using a read
    • (1) outside a session, or
    • (2) in a different session, or
    • (3) outside a transaction in the same session without causal consistency.

If (4) the user did the read in a new transaction in the same session (regardless of causal consistency), the read is guaranteed to return the write, because the new transaction would block until the earlier transaction had committed on the transaction participant.

If (5) the user did the read outside a transaction in the same session but with causal consistency, I think the read would return the write, because the read's afterClusterTime would be >= the operationTime returned for the earlier transaction's commitTransaction >= the transaction's commitTimestamp >= the prepareTimestamp on any transaction participant. I think the storage engine would not allow reading a document that's in prepare at a timestamp >= the document's prepareTimestamp.

Since we in general try to preserve read-your-writes behavior when using readPreference primary, we may want to make coordinateCommitReturnImmediatelyAfterPersistingDecision default to false. Preserving read-your-writes is also a sensible default in Serverless.

If we make this change, we may want to improve documentation for coordinateCommitReturnImmediatelyAfterPersistingDecision for users who want to set it to true.

CC judah.schvimer , tess.avitabile


Acceptance criteria:

  • Change the server parameter value to false.
  • Audit what test coverage we have for coordinateCommitReturnImmediatelyAfterPersistingDecision=true to ensure we aren't losing all of our test coverage for the commitTransaction optimization. We generally aim to test the server's default behavior but having one concurrency suite running with the commitTransaction optimization on would be worthwhile.


 Comments   
Comment by Githook User [ 02/Jun/22 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-63971 Make coordinate commit wait for transaction to actually complete by default

(cherry picked from commit 52defdf7c5793de48a4aea976cc73569e83c2133)
Branch: v5.0
https://github.com/mongodb/mongo/commit/5ab26e8a21763fc61044633228f9df2ce75de133

Comment by Githook User [ 02/Jun/22 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-63971 Make coordinate commit wait for transaction to actually complete by default

(cherry picked from commit 52defdf7c5793de48a4aea976cc73569e83c2133)
Branch: v6.0
https://github.com/mongodb/mongo/commit/cebec3621db74e513d2b50362a98f373d768589d

Comment by Githook User [ 01/Jun/22 ]

Author:

{'name': 'Randolph Tan', 'email': 'randolph@10gen.com', 'username': 'renctan'}

Message: SERVER-63971 Make coordinate commit wait for transaction to actually complete by default
Branch: master
https://github.com/mongodb/mongo/commit/52defdf7c5793de48a4aea976cc73569e83c2133

Comment by Andy Schwerin [ 10/Mar/22 ]

An alternative to this behavior is that we could change readConcern: "majority" reads to block on prepare conflicts until they resolve, rather than returning the prior value unconditionally. This would improve the read-your-writes behavior after 2PC transactions without forcing all transaction committers to wait. The wait could be transferred to readers and only paid in the event that the read both arrived before the transaction commit fully completed on the shards and actually encountered the conflict.

Comment by Garaudy Etienne [ 28/Feb/22 ]

This discovery came from an ongoing customer investigation where we couldn't wrap our minds around why they were not able to read their own writes after committing a transaction. The customer now has correct behavior after this change, but they are not happy with the new performance. We will look into speeding 2PC in general. 

Comment by Andy Schwerin [ 28/Feb/22 ]

I am curious what impact this change will have on benchmarks. I think it's true that it's generally preferable to read your writes on primaries, and it's a behavior worth preserving. I'm also curious, in practice, how often a user in real networking conditions would fail to read their writes with the coordinateCommitReturnImmediatelyAfterPersistingDecision:true. However, I agree that it will be less surprising in those corner cases to change the default to coordinateCommitReturnImmediatelyAfterPersistingDecision:false. I'm curious if we could utilize synchronized wall clocks in the future to achieve this behavior while still returning earlier to the client. Modern clock synchronization algorithms are quite good.

Comment by Garaudy Etienne [ 25/Feb/22 ]

We decided to revert this change last week and default it back to false. cc max.hirschhorn

Comment by Ratika Gandhi [ 25/Feb/22 ]

Added it to our product sync cc garaudy.etienne

Comment by Tess Avitabile (Inactive) [ 25/Feb/22 ]

schwerin, I'm curious what your opinion is about this ticket.

Comment by Esha Maharishi (Inactive) [ 24/Feb/22 ]

If we make this change, we should update tests and passthrough suites that run sharded transactions so that we don't lose coverage of coordinateCommitReturnImmediatelyAfterPersistingDecision:true.

Generated at Thu Feb 08 05:59:09 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.