[SERVER-40201] Retrying commit on original router after a read only commit incorrectly uses two phase commit Created: 18/Mar/19  Updated: 29/Oct/23  Resolved: 20/Mar/19

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: None
Fix Version/s: 4.1.10

Type: Bug Priority: Major - P3
Reporter: Jack Mulrow Assignee: Esha Maharishi (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by SERVER-39991 Add transactions workloads to failove... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Sprint: Sharding 2019-03-25
Participants:

 Description   

To enable read only commit optimizations, shard responses to transaction commands contain a readOnly boolean field that is true if the shard's local transaction is "in-progress" and has applied no operations that would be written to the oplog. While running a sharded transaction, mongos tracks which shards have ever returned readOnly=false and will skip two phase commit if no participants ever did, sending commitTransaction directly to each shard instead.

After committing however, a shard will fail the in-progress condition, so its commit response contains readOnly=false, leading the router to mark the participant as not read only. This means if the client retries commit against the original mongos, the router will no longer believe the transaction is read only, so it sends coordinateCommit to the coordinator, which will try to drive a two phase commit. This will fail during phase one if any participants already committed the txnId, because the coordinator receives TransactionCommitted as their prepare response and retries sending prepare until timing out.

This is especially a problem for failover testing, because a stepdown on any shard during read only commit will lead the shell to retry commitTransaction (because it is treated as a retryable write), which stalls until the coordinator times out if any other shard successfully committed.



 Comments   
Comment by Githook User [ 20/Mar/19 ]

Author:

{'email': 'esha.maharishi@mongodb.com', 'name': 'Esha Maharishi', 'username': 'EshaMaharishi'}

Message: SERVER-40201 Retrying commit on original router after a read only commit incorrectly uses two phase commit
Branch: master
https://github.com/mongodb/mongo/commit/1d8d992f2fef6db349a11893da4f2bf52c39dc86

Comment by Jack Mulrow [ 19/Mar/19 ]

I agree both of those approaches should work. I originally thought the easiest fix is to allow participants to return readOnly=true even if they are not in the kInProgress state so committing/aborting a read-only transaction doesn't make a participant no longer read-only, but it turns out participants clear their transaction operations on commit, so we'd have to change how participants track whether they're read-only, which is possible but probably overkill.

So that said, I'm fine with either option.

Comment by Esha Maharishi (Inactive) [ 18/Mar/19 ]

Hmm. I think we could get around this by either

1) having committing the read-only transaction use the ARS directly rather than the MRS - this way, the participants readOnly fields will not be updated with the response from commit

2) TransactionRouter::processParticipantResponse should be a noop if commit has been received. We could do this by changing the TransactionRouter::_initaitedTwoPhaseCommit bool to just TransactionRouter::_initiatedCommit and checking the bool in processParticipantPresponse.

Generated at Thu Feb 08 04:54:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.