[GODRIVER-2468] Don't check Context expiration in WithTransaction Created: 22/Jun/22  Updated: 28/Oct/23  Resolved: 17/Aug/22

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.10.2

Type: Bug Priority: Major - P3
Reporter: Matt Dale Assignee: Benji Rewis (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Related
is related to GODRIVER-1965 Pre-write context expiration should n... Closed
is related to GODRIVER-2001 Session.WithTransaction method endles... Closed
is related to GODRIVER-2467 Return a Context error if WithTransac... Closed
Epic Link: Client Side Operations Timeout
Quarter: FY23Q2, FY23Q3
Documentation Changes: Not Needed

 Description   

Currently it's possible for a WithTransaction call to return an error when the context is canceled or times out that does not indicate that the error is related to the context expiring.

Consider the context "done" check in WithTransaction here. If the context expires while handling an error unrelated to context expiration (e.g. a transient server error), the returned error will not indicate that WithTransaction returned due to the context expiring. That is potentially confusing to users because the error returned from WithTransaction doesn't always specify why WithTransaction stopped retrying.

Update WithTransaction to continue calling the provided anonymous function until it returns a non-retryable error. Expect that the anonymous function handles the context correctly and returns a non-retryable error when the context expires.

This is an alternative to GODRIVER-2467 that can be completed in the v1.x driver.



 Comments   
Comment by Githook User [ 06/Sep/22 ]

Author:

{'name': 'Benjamin Rewis', 'email': 'benji.rewis@mongodb.com', 'username': 'benjirewis'}

Message: Stop treating context errors as network errors where possible. (#1045)

GODRIVER-2468
GODRIVER-1965
Branch: release/1.10
https://github.com/mongodb/mongo-go-driver/commit/c4993a52e199b170c31820e99355ed434a26fdc7

Comment by Benji Rewis (Inactive) [ 17/Aug/22 ]

We've stopped consistently marking context errors as network errors (and adding the TransientTransactionError label), so the context error checks are no longer needed in WithTransaction. If a context error occurs it should now be handed back to the user and not retried. If a context expires or is canceled after the context check in operation.Execute here, the driver may begin the retry process but should return a non-retryable context error before writing or reading from the wire again.

Comment by Githook User [ 17/Aug/22 ]

Author:

{'name': 'Benjamin Rewis', 'email': '32186188+benjirewis@users.noreply.github.com', 'username': 'benjirewis'}

Message: Stop treating context errors as network errors where possible. (#1045)

GODRIVER-2468
GODRIVER-1965
Branch: release/1.10
https://github.com/mongodb/mongo-go-driver/commit/486441860bbf033cbe721707b1506a639e9140e9

Comment by Benji Rewis (Inactive) [ 10/Aug/22 ]

We should be able to remove the context error checks from WithTransaction once we stop marking context expiration with the TransientTransactionError label. https://github.com/mongodb/mongo-go-driver/pull/1045

Comment by Benji Rewis (Inactive) [ 19/Jul/22 ]

It seems that removing the check entirely won't work either. It was added in GODRIVER-2001 to stop the infinite loop that might happen if WithTransaction is run with an expired context. The Go driver seems to mark context expiration errors (context.DeadlineExceeded, context.Canceled etc) with the TransientTransactionError label here if the driver is currently committing a transaction.

I think we should only append that label if the context is not also expired.

Comment by Wenbin Zhu [ 28/Jun/22 ]

benji.rewis@mongodb.com Thanks for looking into this. Today I heard from someone working on Mongosync that they are still hitting this, so it would be nice if we can have a fix in driver, if that is that is small enough and does not impact other driver behaviors.

return a non-retryable error (it's not clear to me how reasonable this is).

I think even with the current code, this can still happen, if the ctx is cancelled right after this check block, in other words, this check only makes this happening less, so my point is removing this check does not change the current semantics of the API.

 

I think the most idiomatic solution would be to return a context error wrapping the last operation error to provide the most info to users

I think there might be a potential problem with this approach: This changes the current behavior because currently if the ctx is cancelled while executing the callback, the result is some kind of execution error with a `context.Cancelled` error wrapped inside, but what you're suggesting is the other way around, by wrapping the execution error inside `context.Cancelled`. This is a behavior change, and could still result in the original issue if users are checking the root cause of the returned error which is a common practice.

Comment by Benji Rewis (Inactive) [ 28/Jun/22 ]

I do think it's odd that we return the last operation error and not the actual context error when we detect an error in this block. It also seems odd to me, however, to remove that check and rely on the provided callback function to respect the context (reasonable enough) and return a non-retryable error (it's not clear to me how reasonable this is).

I think the most idiomatic solution would be to return a context error wrapping the last operation error to provide the most info to users. Since this does not seem super urgent, I would opt to wait to do GODRIVER-2467 as part of 2.0 and close this ticket. What are your thoughts wenbin.zhu@mongodb.com/matt.dale@mongodb.com?

Comment by Wenbin Zhu [ 27/Jun/22 ]

Related conversation: https://mongodb.slack.com/archives/C7WJZNUTA/p1655484492898759

Generated at Thu Feb 08 08:38:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.