[SERVER-41369] Txns with writes that span multiple PSA shards(disabled majority read concern) - return transient txn error Created: 29/May/19  Updated: 29/Oct/23  Resolved: 23/Jul/19

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.0-rc5, 4.3.1

Type: Task Priority: Major - P3
Reporter: Kay Kim (Inactive) Assignee: Kaloian Manassiev
Resolution: Fixed Votes: 0
Labels: ShardedTxn:KnownBugs
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File shardB.log    
Issue Links:
Backports
Depends
depends on SERVER-41204 Track the participant that voted abor... Closed
Backwards Compatibility: Fully Compatible
Backport Requested:
v4.2
Sprint: Sharding 2019-06-17, Sharding 2019-07-01, Sharding 2019-07-15, Sharding 2019-07-29
Participants:

 Description   

If read concern majority is disabled, transactions with writes that span multiple shards should error.

But, when I have a txn that spans multiple shards (with rc majority disabled), I get a TransientTransactionError instead of a "'prepareTransaction' is not supported with 'enableMajorityReadConcern=false'".

Steps to reproduce:

  1. Start a sharded cluster with 2 PSA shards (enableMajorityReadConcern=false for both shards)
  2. Connect mongo shell to the mongos and run:

    session = db.getMongo().startSession( { readPreference: { mode: "primary" } } );
    coll1 = session.getDatabase("mydb1").foo;
    coll2 = session.getDatabase("mydb2").bar;
    test = session.getDatabase("test").students2;
     
    // Using readconcern local and write concern majority
    session.startTransaction( { readConcern: { level: "local" }, writeConcern: { w: "majority" } } );
     
    try {
        coll1.insertOne( { abc: 1 } );        // ShardB
        coll2.insertOne( { xyz: 999 } );      // ShardB
        test.insertOne( { x: 1 })             // ShardA
    } catch (error) {
        // Abort transaction on error
        session.abortTransaction();
        throw error;
    }
     
     // Commit the transaction using write concern set at transaction start
    session.commitTransaction();
     
    session.endSession();
    
    

    This returns:

    MongoDB Enterprise mongos> session.commitTransaction();
    2019-05-29T13:24:28.121-0400 E  QUERY    [js] uncaught exception: Error: command failed: {
    	"errorLabels" : [
    		"TransientTransactionError"
    	],
    	"ok" : 0,
    	"errmsg" : "Transaction was aborted",
    	"code" : 251,
    	"codeName" : "NoSuchTransaction",
    	"operationTime" : Timestamp(1559150668, 4),
    	"$clusterTime" : {
    		"clusterTime" : Timestamp(1559150668, 4),
    		"signature" : {
    			"hash" : BinData(0,"jPzNbm4zF7knG/Lw+fOM/+PuNKw="),
    			"keyId" : NumberLong("6696440883090292753")
    		}
    	},
    	"recoveryToken" : {
    		"recoveryShardId" : "shardB"
    	}
    } 
    

    I would have expected this to error once and stop immediately (similar to what the txn does if using rc snapshot with rc majority disabled)

Am attaching the log of shardB primary (had turned the verbosity level to 4)

From the log, the txnCoordinator sends prepare txn to shardB again even after the coordinator received the assertion (which is the error I would expect to see bubble up and stop everything).

2019-05-29T13:50:11.266-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  shard shardA
2019-05-29T13:50:11.268-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  local  shard shardB
2019-05-29T13:50:11.269-0400 D2 COMMAND  [TransactionCoordinator] run command admin.$cmd { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" }, $db: "admin" }
2019-05-29T13:50:11.271-0400 D2 COMMAND  [TransactionCoordinator] command: prepareTransaction
2019-05-29T13:50:11.272-0400 I  TXN      [TransactionCoordinator] transaction parameters:{ lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, readConcern: { level: "snapshot" } }, readTimestamp:Timestamp(0, 0), ninserted:2 keysInserted:2 terminationCause:aborted timeActiveMicros:162 timeInactiveMicros:2089046 numYields:0 locks:{ ReplicationStateTransition: { acquireCount: { w: 4 } }, Global: { acquireCount: { r: 2, w: 1 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 1, w: 2 } }, Mutex: { acquireCount: { r: 9 } } } storage:{} wasPrepared:0, 2089ms
2019-05-29T13:50:11.273-0400 D1 COMMAND  [TransactionCoordinator] assertion while executing command 'prepareTransaction' on database 'admin' with arguments '{ prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" }, $db: "admin" }': Location50993: 'prepareTransaction' is not supported with 'enableMajorityReadConcern=false'
2019-05-29T13:50:11.274-0400 I  COMMAND  [TransactionCoordinator] command admin.$cmd command: prepareTransaction { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" }, $db: "admin" } numYields:0 ok:0 errMsg:"'prepareTransaction' is not supported with 'enableMajorityReadConcern=false'" errName:Location50993 errCode:50993 reslen:460 locks:{ ReplicationStateTransition: { acquireCount: { w: 4 } }, Global: { acquireCount: { r: 2, w: 1 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 1, w: 2 } }, Mutex: { acquireCount: { r: 9 } } } flowControl:{ acquireCount: 1 } storage:{} protocol:op_msg 1ms
2019-05-29T13:50:11.275-0400 D3 TXN      [TransactionCoordinator] Coordinator shard received Location50993: 'prepareTransaction' is not supported with 'enableMajorityReadConcern=false' from shard shardB for { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } }
2019-05-29T13:50:11.276-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  local  shard shardB
2019-05-29T13:50:11.277-0400 D2 COMMAND  [TransactionCoordinator] run command admin.$cmd { prepareTransaction: 1, lsid: { id: UUID("003f3817-1a36-45c2-a439-d91dd9fffa16"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" }, $db: "admin" }
2019-05-29T13:50:11.277-0400 D2 COMMAND  [TransactionCoordinator] command: prepareTransaction



 Comments   
Comment by Githook User [ 24/Jul/19 ]

Author:

{'name': 'Kaloian Manassiev', 'email': 'kaloian.manassiev@mongodb.com', 'username': 'kaloianm'}

Message: SERVER-41369 Terminate the TransactionCoordinator retry logic if read concern majority is not available

(cherry picked from commit e3c26cf2f4c6ea3d597f42360442c0b6320e84bc)
Branch: v4.2
https://github.com/mongodb/mongo/commit/88be1e9ba104dffbc23aae7a817a944cf6058eff

Comment by Githook User [ 23/Jul/19 ]

Author:

{'name': 'Kaloian Manassiev', 'username': 'kaloianm', 'email': 'kaloian.manassiev@mongodb.com'}

Message: SERVER-41369 Terminate the TransactionCoordinator retry logic if read concern majority is not available
Branch: master
https://github.com/mongodb/mongo/commit/e3c26cf2f4c6ea3d597f42360442c0b6320e84bc

Comment by Esha Maharishi (Inactive) [ 29/May/19 ]

I think what's happening is that the first prepareTransaction to each participant returns the Location50993 error, on which the coordinator retries, because that error isn't a voteAbort error (on which the coordinator would make an abort decision), and also isn't one of TransactionCoordinatorSteppingDown or TransactionCoordinatorReachedAbortDecision.

Then the prepareTransaction retry to each participant returns NoSuchTransaction, since each participant already aborted on throwing the Location50993 error on the first try.

The Location50993 error is not returned to the client because 1) it is not considered a vote to abort, and 2) even if it was, the coordinateCommitTransaction command would convert it to NoSuchTransaction.

To fix this, we could 1) give that error a named code and include it in VoteAbortErrors, and 2) make the coordinateCommitTransaction command bubble up the exact error that caused the coordinator to make an abort decision.

// Coordinator starts collecting votes
2019-05-29T13:11:03.608-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  shard shardA
2019-05-29T13:11:03.614-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  local  shard shardB
 
// The coordinator's local participant aborts (I assume because the prepareTransaction request throws)
2019-05-29T13:11:03.616-0400 I  TXN      [TransactionCoordinator] transaction parameters:{ lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, readConcern: { level: "snapshot" } }, readTimestamp:Timestamp(0, 0), ninserted:2 keysInserted:2 terminationCause:aborted timeActiveMicros:172 timeInactiveMicros:60136 numYields:0 locks:{ ReplicationStateTransition: { acquireCount: { w: 3 } }, Global: { acquireCount: { r: 1, w: 1 } }, Database: { acquireCount: { r: 1, w: 2 } }, Collection: { acquireCount: { r: 1, w: 2 } }, Mutex: { acquireCount: { r: 9 } } } storage:{} wasPrepared:0, 60ms
 
// The coordinator gets the 50993 error from both participants, and so schedules retries
2019-05-29T13:11:03.627-0400 D3 TXN      [TransactionCoordinator] Coordinator shard received Location50993: 'prepareTransaction' is not supported with 'enableMajorityReadConcern=false' from shard shardB for { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } }
2019-05-29T13:11:03.639-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  local  shard shardB
2019-05-29T13:11:03.647-0400 D3 TXN      [TransactionCoordinator] Coordinator shard received Location50993: 'prepareTransaction' is not supported with 'enableMajorityReadConcern=false' from shard shardA for { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } }
2019-05-29T13:11:03.655-0400 D3 TXN      [TransactionCoordinator] Coordinator going to send command { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } } to  shard shardA
 
// The coordinator now gets back NoSuchTransaction from one of the participants, so it makes an abort decision
2019-05-29T13:11:03.654-0400 D3 TXN      [TransactionCoordinator] Coordinator shard received NoSuchTransaction: Transaction 0 has been aborted. from shard shardB for { prepareTransaction: 1, lsid: { id: UUID("4ae24618-605d-478a-a2eb-a0573d25adcd"), uid: BinData(0, DE9C6A9004CD518295FECA13EEAA8A134CC2D0115BD29073D69938C577121EC2) }, txnNumber: 0, autocommit: false, writeConcern: { w: "majority" } }
 
// The abort decision causes the outstanding request to the other participant to be cancelled
2019-05-29T13:11:03.656-0400 D3 TXN      [TransactionCoordinator] Prepare stopped retrying due to retrying being cancelled

Generated at Thu Feb 08 04:57:32 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.