[SERVER-39346] Extend failCommand to return an error after executing the command Created: 01/Feb/19  Updated: 23/Mar/23  Resolved: 24/Feb/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: Shane Harvey Assignee: Backlog - Service Architecture
Resolution: Won't Do Votes: 0
Labels: neweng, re-triaged-ticket
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Service Arch
Participants:

 Description   

The failCommand fail point should have an option to return the configured error only after executing the command normally (similar to how the failBeforeCommitExceptionCode option works for the onPrimaryTransactionalWrite fail point). Right now, the error is returned before executing the command. This is important for testing sharded transactions in drivers because the recoveryToken only allows us to recover the state of the transaction.

I did not see the need for this feature earlier because I did not connect the dots with how the recoveryToken works and how the fail point works. Specifically, a commit on a new mongos cannot initiate the 2PC procedure when the first commit is never initiated on the original mongos, instead it simply waits for the transaction to timeout.

So right now, drivers can only test the following scenarios:

  • commit succeeds on mongos A with no retry.
  • commit fails on mongos A with no retry (because of a non-retryable error).
  • commit fails on mongos A, retry succeeds on mongos A.
  • commit fails on mongos A, retry fails on mongos A.
  • commit fails on mongos A, retry fails on mongos B.

With this feature we can start testing the following:

  • commit fails on mongos A (but it actually succeeds on the cluster), retry succeeds on mongos B because the initial commit succeeded.


 Comments   
Comment by Shane Harvey [ 28/Nov/22 ]

Sounds good to me. I believe we're fine with relying on the writeConcernError failCommand behavior for now.

Comment by Lauren Lewis (Inactive) [ 24/Feb/22 ]

We haven’t heard back from you for at least one calendar year, so this issue is being closed. If this is still an issue for you, please provide additional information and we will reopen the ticket.

Comment by Gregory McKeon (Inactive) [ 12/Feb/19 ]

shane.harvey per SERVER-35083, this is intentional and you can rely on this behavior.

Comment by Shane Harvey [ 12/Feb/19 ]

I've confirmed that the current "writeConcernError" failCommand behavior is sufficient to test 4.2 sharded transactions so I think we can de-prioritize this ticket. Can someone on the server confirm that this behavior is intentional and not subject to change?

I still think it would still be nice to have this feature because a "failAfterRunningCommand: true" flag would be much more clear and slightly less verbose than using "writeConcernError". Also I suspect drivers might want this feature to test non-write concern errors in the future.

Comment by Shane Harvey [ 01/Feb/19 ]

It looks like when failCommand is configured to return a writeConcernError, the write is actually applied and then the writeConcernError is added to the reply. Since certain writeConcernErrors cause driver to automatically retry the commit, I think this behavior might be good enough to test all the scenarios we want.

Here's an example of the failCommand/writeConcernError behavior:

mongos> db.test.drop();
true
mongos> db.adminCommand({
...     "configureFailPoint" : "failCommand",
...     "mode" : { "times" : 1 },
...     "data" : {
...         "failCommands" : ["insert"],
...         "writeConcernError" : {
...             "code" : 91,
...             "errmsg" : "Replication is being shut down"
...         }
...     }
... });
{ "ok" : 1 }
mongos> db.runCommand({insert: "test", documents: [{_id:1}]});
{
	"n" : 1,
	"writeConcernError" : {
		"code" : 91,
		"errmsg" : "Replication is being shut down"
	},
	"ok" : 1
}
mongos> db.test.find();
{ "_id" : 1 }

Generated at Thu Feb 08 04:51:45 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.