[SERVER-78407] Skip read only shard commit phase for two phase writes Created: 23/Jun/23  Updated: 28/Nov/23

Status: In Code Review
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Jack Mulrow Assignee: Jack Mulrow
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-78407-POC.diff    
Issue Links:
Related
related to SERVER-48340 Re-enable single-write-shard transact... Closed
is related to SERVER-79056 Measure performance for updateOne wit... Closed
Assigned Teams:
Sharding NYC
Sprint: Sharding NYC 2023-07-10, Sharding NYC 2023-07-24, Sharding NYC 2023-08-07, Sharding NYC 2023-08-21, Sharding NYC 2023-09-04, Sharding NYC 2023-09-18, Sharding NYC 2023-10-02, Sharding NYC 2023-10-16, Sharding NYC 2023-10-30, Cluster Scalability 2023-11-13, Cluster Scalability 2023-11-27
Participants:

 Description   

The two phase write protocol for single writes without a shard key match do a broadcast read and then write to a single shard with a matching document. SERVER-48340 restores the single write shard commit optimization, which lets two phase writes skip two phase commit, since they will always write to at most one shard.

The single write shard optimization has two phases: a round of commits on the read only shards, then if all read only commits were successful, sending commit to the one write shard. Two phase writes can be further optimized to skip the first commit phase because single writes don't care if read only shards not chosen to perform the write are able to commit. To satisfy the guarantees of a single write (ie only one matching document is updated), it's enough to only commit on the chosen write shard. Thus commit can become a single phase - committing on the write shard and in parallel committing (or aborting since it's cheaper) on the read shards (so they release their resources, their responses can be ignored).



 Comments   
Comment by Jack Mulrow [ 03/Jul/23 ]

I agree, I don't think this is a valid general optimization. The scope of this ticket is just for the "two phase write protocol" for update/deleteOne and findAndModify from PM-1632 (excluding findAndModify with sort since that updates a document matching a collection global property, so we need to know all reads were valid).

Comment by Max Hirschhorn [ 30/Jun/23 ]

The single write shard optimization has two phases: a round of commits on the read only shards, then if all read only commits were successful, sending commit to the one write shard. Two phase writes can be further optimized to skip the first commit phase because single writes don't care if read only shards not chosen to perform the write are able to commit. To satisfy the guarantees of a single write (ie only one matching document is updated), it's enough to only commit on the chosen write shard.

I'm not sure this will be valid as a general optimization. It may be possible to do something more targeted for updateOne without shard key because the one update is done in its own batch. It is necessary to get an ok:1 response for read concern level "majority" transactions. This is because the speculative majority behavior means committing with w:majority is required to affirm the reads were actually of majority-committed data.

Where I see the proposal leading to a consistency anomaly would be something like the following sequence:

  1. Client starts a Transaction T1 with {readConcern: {level: "majority"}, writeConcern: {w: "majority"}}.
  2. Client performs a read in Transaction T1. Router_1 forwards the read to Shard_1.
  3. Client performs a write in Transaction T1. Router_1 forwards the write to Shard_2. Shard_2 as the first writer shard in the transaction is chosen as the recovery shard.
  4. Client sends commitTransaction to Router_1.
  5. (New proposal) Router_1 broadcasts commitTransaction to Shard_1 and Shard_2. Let's say Shard_2 commits successfully and Shard_1 fails with a write concern error.
  6. Client gets disconnected from Router_1 due to an unexpected network error.
  7. Client retries commitTransaction through Router_2, passing along that Shard_2 is the recovery shard.
  8. Router_2 recovers the decision from Shard_2 and learns Shard_2 had committed the transaction.
  9. Client incorrectly believes the entire transaction successfully committed when the transaction is not known to have committed on Shard_1.
Generated at Thu Feb 08 06:38:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.