[GODRIVER-2001] Session.WithTransaction method endless loop Created: 07/May/21  Updated: 28/Oct/23  Resolved: 27/May/21

Status: Closed
Project: Go Driver
Component/s: None
Affects Version/s: None
Fix Version/s: 1.5.3

Type: Bug Priority: Major - P3
Reporter: 峻铭 张 Assignee: Benji Rewis (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to GODRIVER-2468 Don't check Context expiration in Wit... Closed
related to DRIVERS-1753 Allow configurable WithTransaction ti... Closed
Documentation Changes: Not Needed

 Description   

version: 1.5.2

I use a context with a short timeout, and find WithTransaction method will be stuck in a endless loop

Here is my code:

client, err := mongo.NewClient()
if err != nil {
   return
}
 
err = client.Connect(context.Background())
if err != nil {
   return
}
s, err := client.StartSession()
if err != nil {
   return
}
_, err = s.WithTransaction(context.Background(), func(ctx mongo.SessionContext) (interface{}, error) {
   c, cancel := context.WithTimeout(ctx, time.Nanosecond)
   defer cancel()
 
   // infinite loop
   _, err = client.Database("xxx").
      Collection("xxx").
      InsertOne(c, nil)
   if err != nil {
      return nil, err
   }
 
   return nil, nil
})
if err != nil {
   return
}

The InsertOne method will return the following error:

 

 

connection(xxxx) failed to write: context deadline exceeded

The error has TransientTransactionError and NetworkError label, so it make WithTransaction method stuck in an infinite loop

 

 

for {
   err = s.StartTransaction(opts...)
   if err != nil {
      return nil, err
   }
 
   res, err := fn(NewSessionContext(ctx, s))
   if err != nil {
      if s.clientSession.TransactionRunning() {
         // Wrap the user-provided Context in a new one that behaves like context.Background() for deadlines and
         // cancellations, but forwards Value requests to the original one.
         _ = s.AbortTransaction(internal.NewBackgroundContext(ctx))
      }
 
      select {
      case <-timeout.C:
         return nil, err
      default:
      }
 
      if errorHasLabel(err, driver.TransientTransactionError) {
         continue
      }
      return res, err
   }
}

 

 



 Comments   
Comment by Benji Rewis (Inactive) [ 27/May/21 ]

zjm448172381@gmail.com the fix is now merged and should be available with the next patch of the Go driver.

Comment by Githook User [ 27/May/21 ]

Author:

{'name': 'Benjamin Rewis', 'email': '32186188+benjirewis@users.noreply.github.com', 'username': 'benjirewis'}

Message: GODRIVER-2001 Do not retry transaction after expired or canceled context (#668)
Branch: release/1.5
https://github.com/mongodb/mongo-go-driver/commit/ed1fd57ecc5a17b4c5b34458cafce2b417107a3b

Comment by Githook User [ 27/May/21 ]

Author:

{'name': 'Benjamin Rewis', 'email': '32186188+benjirewis@users.noreply.github.com', 'username': 'benjirewis'}

Message: GODRIVER-2001 Do not retry transaction after expired or canceled context (#668)
Branch: master
https://github.com/mongodb/mongo-go-driver/commit/d25f3402a40e5c18d8ac2ae327d591550ab4eabb

Comment by 峻铭 张 [ 17/May/21 ]

Thank you @Benji Rewis . There are no other problems at the moment.

Comment by Benji Rewis (Inactive) [ 14/May/21 ]

Apologies for the delay zjm448172381@gmail.com ! After discussing with the rest of the team, we’ve decided that we can check specifically for a context deadline exceeded or canceled error during the transaction and return early if one is encountered. A fix is in review now: https://github.com/mongodb/mongo-go-driver/pull/668.

Thank you again for your report and all the information you provided.

I’ve also created a drivers ticket (DRIVERS-1753) to ask about making the WithTransaction timeout configurable drivers-wide.

Let me know if you have any other questions or concerns.

Comment by 峻铭 张 [ 13/May/21 ]

Hi @Benji Rewis, I think `withTransactionTimeout` should be configurable. And I also think should add some config that can control retry times.

In my case, I expect `WithTransaction` method should return an error when as long as occurs error in the transaction. I don't want to retry any command. So I want to able to config retry times.

Comment by Benji Rewis (Inactive) [ 12/May/21 ]

Thank you zjm448172381@gmail.com, I can now reproduce the series of errors if I Ping the server before running WithTransaction.

Here’s what I think is happening. Without Ping, you will get a server selection timeout with InsertOne in WithTransaction. This is not a TransientTransactionError, so the transaction will not retry.

With Ping, server selection will occur through Ping and be successful before WithTransaction, so InsertOne will actually try to write to the wire in roundTrip. Because of the short timeout, InsertOne will fail to write to the wire with context deadline exceeded. Per our drivers-wide specifications, we mark this failure to write as a TransientTransactionError. 

When any non-commitTransaction command fails with a network error within a transaction Drivers add the "TransientTransactionError" label because the client doesn’t know if it has modified data in the transaction or not. Therefore it must abort and retry the entire transaction to be certain it has executed each command in the transaction exactly once.

InsertOne will continue to fail in writing to the wire because of the short timeout, and we will continue to retry since we don’t know if the write actually went through. The loop is not endless, and will stop within 2 minutes (as defined by the non-configurable withTransactionTimeout).

We do want to retry writing when encountering network errors during transactions, so this series of errors is probably expected behavior. If you think the 2 minute delay defined by withTransactionTimeout ought to be something that's configurable, I could bring that up drivers-wide!

Comment by 峻铭 张 [ 12/May/21 ]

Hi @Benji Rewis, thx for your reply. I rule out the reason of storage or whitelist.

I find the reason is `Ping` method before `StartSession` method.

// this method cause loop
client.Ping(context.Background(), readpref.Primary())
 
s, err := client.StartSession()

If I add `Ping` method, it will be in an endless loop. And if I remove this method, it will be back to normal. The following is the error message of the test result.

// remove Ping method
 
server selection error: context deadline exceeded, current topology: { Type: ReplicaSetNoPrimary, Servers: [{ Addr: xxx:3717, Type: Unknown, Average RTT: 0 }, { Addr: xxx:3717, Type: Unknown, Average RTT: 0 }, ] }

// add Ping method
 
// 1st loop
connection(xxx:3717[-10]) failed to write: context deadline exceeded
// 2nd loop
connection(xxx:3717[-10]) failed to write: context deadline exceeded
// 3th loop
connection(xxx:3717[-10]) failed to write: context deadline exceeded
......

I will continue to debug this strange problem.

 

 

 

Comment by Benji Rewis (Inactive) [ 11/May/21 ]

Hello again zjm448172381@gmail.com!

Thanks again for your patience as we try to reproduce this error. Using Go driver version 1.5.2 and Golang version 1.13, I cannot replicate the infinite loop against a MongoDB 4.2 standalone, replica set, or sharded cluster. In my case, the error from InsertOne has a NetworkError label but not a TransientTransactionError label, so the Transaction is not retried.

A normal context deadline exceeded error from an expired context should not have the TransientTransactionError label, so I think there’s a something else going on.

I have a couple hypotheses. Are you using Atlas? A MongoNetworkError with the TransientTransactionError label can occur when your IP address is not whitelisted (found under “Network Access/IP Access List” in the Atlas UI).

Have you run out of storage space in your MongoDB instance? This could cause a “failed to write” error that might have the TransientTransactionError label.

In any case, is there any other info included in the error message?

Comment by 峻铭 张 [ 11/May/21 ]

Hello @Benji Rewis , here is my version:

mongo version: 4.2
go version: 1.13

Comment by Benji Rewis (Inactive) [ 10/May/21 ]

Hello zjm448172381@gmail.com! We're still working on reproducing this issue. What MongoDB server version were you running against? And, what version of Go are you using?

Comment by Benji Rewis (Inactive) [ 07/May/21 ]

Hello zjm448172381@gmail.com!

Thank you for your report; we're looking into this issue now.

Generated at Thu Feb 08 08:37:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.