Delay AbortTransaction if the callback returns a timeout error in WithTransaction

XMLWordPrintableJSON

    • Type: Improvement
    • Resolution: Unresolved
    • Priority: Major - P3
    • None
    • Affects Version/s: None
    • Component/s: Transactions
    • None
    • None
    • Go Drivers
    • Hide

      1. What would you like to communicate to the user about this feature?
      2. Would you like the user to see examples of the syntax and/or executable code and its output?
      3. Which versions of the driver/connector does this apply to?

      Show
      1. What would you like to communicate to the user about this feature? 2. Would you like the user to see examples of the syntax and/or executable code and its output? 3. Which versions of the driver/connector does this apply to?
    • None
    • None
    • None
    • None
    • None
    • None

      Context

      WithTransaction always runs abortTransaction if the transaction callback returns an error and has run at least one operation. However, the abortTransaction operation can fail if the first transaction operation times out shortly after sending the operation. In that case, abortTransaction can fail with an error like:

      (NoSuchTransaction) Given transaction number 40 does not match any in-progress transactions. The active transaction number is 39
      

      After abortTransaction returns the above error, it's possible that the transaction may actually have been started on the server and will then remain open until it times out on the server. That can lead to write conflicts if another transaction attempts to update the same document(s).

      E.g. sequence that can cause the above error:

      1. Call WithTransaction with a Context with cancellation signal.
      2. The WithTransaction callback runs UpdateOne in the transaction.
      3. Cancel the Context when the driver is reading the UpdateOne response.
      4. WithTransaction runs abortTransaction in response to the callback returning a "context cancelled" error.
      5. abortTransaction returns an error, but the transaction stays open on the server.

      Definition of done

      Add a short sleep in the error block after calling the WithTransaction callback. For example:

      res, err := fn(NewSessionContext(ctx, s))
      if err != nil {
      	if s.clientSession.TransactionRunning() {
      		if errors.Is(err, context.DeadlineExceeded) || errors.Is(err, context.Canceled) || ctx.Err() != nil {
      			time.Sleep(25 * time.Millisecond)
      		}
      		// ...
      

      The 25ms sleep is somewhat arbitrary, but is experimentally validated to work in local testing with a script that can reproduce the described abortTransaction failure. It's not clear what state change on the server-side we're waiting for, or if it's possibly related to out-of-order TCP packets, so it's not clear if there's a minimum value. The risk of choosing a higher value than necessary seems low because it only applies after a callback timeout error.

      Pitfalls

      • It's not clear how long we should sleep to reduce the probability of abortTransaction failing. If we wait too little, it won't do anything. If we wait too long, it will impact the usability of WithTransaction. Sleep synchronization in general is not robust.

            Assignee:
            Unassigned
            Reporter:
            Matt Dale
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: