[SERVER-34187] FailPoint to hang up on client in abortTransaction/commitTransaction Created: 29/Mar/18  Updated: 08/Jan/24  Resolved: 19/Apr/18

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: A. Jesse Jiryu Davis Assignee: Spencer Brody (Inactive)
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-34551 Add failpoint to fail commands with n... Closed
Related
is related to SERVER-34057 Update onPrimaryTransactionalWrite fa... Closed
Sprint: Repl 2018-04-09, Repl 2018-04-23
Participants:

 Description   

Drivers will need a way to test that they retry abortTransaction or commitTransaction once, according to the spec, if we get a network error on the first attempt.

cc shane.harvey and jeff.yemin



 Comments   
Comment by Spencer Brody (Inactive) [ 19/Apr/18 ]

We will do SERVER-34551 instead

Comment by Spencer Brody (Inactive) [ 04/Apr/18 ]

Why would you run them concurrently? If the connection you run closeAllConnections on doesn't get closed, then you run closeAllConnections, wait for a response, then on another connection you run commitTransaction, which will now fail with a socket error.

Comment by Shane Harvey [ 04/Apr/18 ]

The problem is that drivers have no way of running two commands at exactly the same time on the server.

Comment by Spencer Brody (Inactive) [ 04/Apr/18 ]

Ah, so you're worried about the case where closeAllConnections starts running, closes the connection that it is running on, the driver gets a network error, runs commitTransaction, and the server receives and runs commitTransaction all before the closeAllConnectsion gets around to closing the connection that commitTransaction is running on? Yeah that does sound like a race. mira.carey@mongodb.com, do you have a sense for how difficult it would be to have a function that closes all incoming connections to mongod except the connection that the request is currently running on?

Comment by Shane Harvey [ 03/Apr/18 ]

I'm not sure we can rely on what you've described for testing. It sounds like drivers would need to run closeAllConnections and commit concurrently. In that case, any of the following could occur:

  1. closeAllConnections completes before the commit command, commit succeeds.
  2. closeAllConnections completes after the commit command, commit succeeds.
  3. closeAllConnections completes in parallel to the commit command, commit fails with a network error.

I think we need a way to deterministically cause a network error.

Comment by Spencer Brody (Inactive) [ 03/Apr/18 ]

The command would immediately close all network connections coming in to the server, so it's likely (though racy) that you won't get the response from the closeAllConnections command. There's no way for the server to predict what connection is going to have an isMaster come in on it soon, so it would be up to the drivers to deal with SDAM failures.

If the goal is to use this to force a network error on the commitTransaction command, I imagine you would run the closeAllConnections command on a different connection than you run the commitTransaction command, so that if there's any auto-reconnect behavior built in, it won't already have reconnected the connection after the network error received from running closeAllConnections.

Comment by Shane Harvey [ 03/Apr/18 ]

So the proposed command would cause the next command to hang up all connections?

Something like:

> db.adminCommand("hangupAllConnections")
{ "ok" : 1 }
> db.adminCommand({"commitTransaction":1, ...})
<NetworkError>

I think that would work. We might want a way to provide a black/white list of commands though. We would not want isMaster issued from SDAM to cause a network failure in between the hangupAllConnections and commit commands.

Comment by A. Jesse Jiryu Davis [ 02/Apr/18 ]

I think that sounds great. shane.harvey what do you think?

Comment by Spencer Brody (Inactive) [ 02/Apr/18 ]

What about a test command to hang up all connections? Then you could just run that command any time you want the next command to fail with a network error? That might be more future-proof.

jesse

Generated at Thu Feb 08 04:35:50 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.