[JAVA-4684] Read operations in transactions are incorrectly retried Created: 22/Jul/22  Updated: 28/Oct/23  Resolved: 25/Jul/22

Status: Closed
Project: Java Driver
Component/s: Retryability, Transaction Management
Affects Version/s: 4.4.0
Fix Version/s: 4.7.1

Type: Bug Priority: Minor - P4
Reporter: Vitalijs Levasins Assignee: Jeffrey Yemin
Resolution: Fixed Votes: 0
Labels: regression
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File image-2022-07-22-12-30-37-436.png    
Issue Links:
Related
is related to JAVA-4708 java.net.SocketException instead of a... Closed
is related to DRIVERS-2395 Add test that reads are not retried i... Closed
Case:

 Description   

Summary

The issue occurs when the first transactional read operation, which should initiate a transaction, fails (for instance, due to reelection of primary node).

Mongo Driver incorrectly retries the operation, which leads to an error 251 "NoSuchTransaction".

The expected result as per the retryable reads specification is that the read operation is not retried when executed in a transaction.

Note that even after the bug is fixed, the outcome will be largely the same: an exception will be thrown with the TransientTransactionError error label. However, it will be a less confusing exception, and there will no longer be a retry which is contrary to specification. If the application uses ClientSession.withTransaction, the whole transaction will be retried upon catching an exception with the TransientTransactionError error label.

Please provide the version of the driver. If applicable, please provide the MongoDB server version and topology (standalone, replica set, or sharded cluster).

org.mongodb:mongodb-driver-core:4.6.1

We are using a replica set, managed by Atlas.

How to Reproduce

Run a find() command inside a transactional context after a reelection of primary node (in fact any network-specific error will do). 

Example:

First attempt:

2022-07-22 10:46:26.664 DEBUG 1432 --- [nio-8080-exec-4] org.mongodb.driver.protocol.command      : 
Sending command '\{"find": "submissions", "filter": {}, "projection": \{"census": false, "benefitPlan": false, "terms": false, "producer": false}, "limit": 20, "$db": "dev-submission-api", "$clusterTime": \{"clusterTime": {"$timestamp": {"t": 1658479556, "i": 1}}, "signature": \{"hash": {"$binary": {"base64": "DjgGnH/anK30OdlpOxrosXKL/U4=", "subType": "00"}}, "keyId": 7102362910864703492}}, "lsid": \{"id": {"$binary": {"base64": "jjddGYaCTNizTnGWaLqj+g==", "subType": "04"}}}, "txnNumber": 2, "startTransaction": true, "autocommit": false}' with request id 105 to database dev-submission-api on connection [connectionId\{localValue:45, serverValue:91713}] to server _________-shard-00-01.ezibq.mongodb.net:27017

Then a connection error followed by a retry (note the missing startTransaction flag):

2022-07-22 10:48:06.315 DEBUG 1432 --- [nio-8080-exec-4] org.mongodb.driver.protocol.command      : 
Sending command '\{"find": "submissions", "filter": {}, "projection": \{"census": false, "benefitPlan": false, "terms": false, "producer": false}, "limit": 20, "$db": "dev-submission-api", "$clusterTime": \{"clusterTime": {"$timestamp": {"t": 1658479556, "i": 1}}, "signature": \{"hash": {"$binary": {"base64": "DjgGnH/anK30OdlpOxrosXKL/U4=", "subType": "00"}}, "keyId": 7102362910864703492}}, "lsid": \{"id": {"$binary": {"base64": "jjddGYaCTNizTnGWaLqj+g==", "subType": "04"}}}, "txnNumber": 2, "autocommit": false}' with request id 118 to database dev-submission-api on connection [connectionId\{localValue:73, serverValue:91747}] to server __________-shard-00-01.ezibq.mongodb.net:27017

Followed by a confusing error message (but note the correct error label):

com.mongodb.MongoCommandException: Command failed with error 251 (NoSuchTransaction): 
'Given transaction number 2 does not match any in-progress transactions. 
The active transaction number is 1' on server ____________-shard-00-01.ezibq.mongodb.net:27017. 
The full response is \{"errorLabels": ["TransientTransactionError"], 
"ok": 0.0, "errmsg": "Given transaction number 2 does not match any in-progress transactions. The active transaction number is 1", 
"code": 251, "codeName": "NoSuchTransaction", 
"$clusterTime": {"clusterTime": {"$timestamp": {"t": 1658479676, "i": 1}}, "signature": \{"hash": {"$binary": {"base64": "WpRsowuO7W+lJR+EGTVgZzL6rqw=", "subType": "00"}}, "keyId": 7102362910864703492}}, "operationTime": \{"$timestamp": {"t": 1658479676, "i": 1}}}



 Comments   
Comment by Githook User [ 25/Jul/22 ]

Author:

{'name': 'Jeff Yemin', 'email': 'jeff.yemin@mongodb.com', 'username': 'jyemin'}

Message: Do not retry a read operation when in a transaction (#982)

Fixes a regression introduced in 4.4.0 in scope of JAVA-4034, and
was undetected due to the lack of a specification test for the
required behavior.

JAVA-4684
Branch: 4.7.x
https://github.com/mongodb/mongo-java-driver/commit/f42ef7d028dc4b5b2bdda3e32bfe1f8991c2c21b

Comment by Githook User [ 25/Jul/22 ]

Author:

{'name': 'Jeff Yemin', 'email': 'jeff.yemin@mongodb.com', 'username': 'jyemin'}

Message: Do not retry a read operation when in a transaction (#982)

Fixes a regression introduced in 4.4.0 in scope of JAVA-4034, and
was undetected due to the lack of a specification test for the
required behavior.

JAVA-4684
Branch: master
https://github.com/mongodb/mongo-java-driver/commit/afb5d0ae6693d1915b7afa3b84d250974fe15998

Comment by Jeffrey Yemin [ 22/Jul/22 ]

Changing to Minor because after fixing the bug the outcome is mostly the same: an exception will be thrown with the TransientTransactionError error label. However, it will be a less confusing exception, and there will no longer be a retry which is contrary to specification.

If the application uses ClientSession.withTransaction, the whole transaction will be retried upon catching an exception with the TransientTransactionError error label.

Fix is in review.

Comment by Jeffrey Yemin [ 22/Jul/22 ]

Hi vitalijs_levasins@epam.com, thanks for reporting this issue.

According to this section of the retryable reads specification, the driver should not be retrying this operation at all. We will investigate to see why it appears to be retrying.

Comment by Vitalijs Levasins [ 22/Jul/22 ]

Retry operation is performed in FindOperation class. See execute() method.

The issue can be easily reproduced if execution is stopped just before running of createReadCommandAndExecute(). Introduce some network issue and continue the execution. After some time, execution will be retried - when you'll get to the breakpoint (second attempt of running createReadCommandAndExecute()), enable the networking. Now mongo driver is able to run the find() command, which will be executed without startTransaction flag.

Generated at Thu Feb 08 09:02:42 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.