[SERVER-34943] failCommand failpoint should ignore commands from replica set members Created: 10/May/18  Updated: 08/Jan/24  Resolved: 08/Jan/19

Status: Closed
Project: Core Server
Component/s: Replication
Affects Version/s: None
Fix Version/s: 4.0.6, 4.1.7

Type: Improvement Priority: Major - P3
Reporter: Shane Harvey Assignee: A. Jesse Jiryu Davis
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by DRIVERS-405 Retryable Reads Closed
Related
related to SERVER-34551 Add failpoint to fail commands with n... Closed
is related to SERVER-38188 Create split brain passthroughs Backlog
is related to SERVER-34960 Add a way to enter a failpoint withou... Closed
is related to SERVER-35004 Add functionality to only fail specif... Closed
Backwards Compatibility: Minor Change
Backport Requested:
v4.0
Sprint: Repl 2018-06-04, Repl 2018-12-17
Participants:

 Description   

When a secondary runs a command on the primary, it will trigger the failCommand failpoint. This is unexpected for drivers because we would like to use failCommand to test retryable writes and transactions on replica sets.

Some options I see:

  1. change failCommand to ignore commands from replica set members
  2. change failCommand to ignore all commands except those in the sessionCheckoutWhitelist?
  3. drivers only use failCommand against single node replica sets or standalone servers
  4. drivers never use the "skip" and "times" options and instead configure the failpoint as "alwaysOn" or "off"

jesse, spencer



 Comments   
Comment by Githook User [ 01/Feb/19 ]

Author:

{'name': 'vincentkam', 'email': 'vincent.kam@10gen.com', 'username': 'vincentkam'}

Message: Update retryable-reads tests to use "operations"

Comment by Githook User [ 18/Jan/19 ]

Author:

{'username': 'ajdavis', 'email': 'jesse@mongodb.com', 'name': 'A. Jesse Jiryu Davis'}

Message: SERVER-34943 Ignore internal commands with "failCommand"
Branch: v4.0
https://github.com/mongodb/mongo/commit/fd490e4b0e9776dcc79ff21ae17e3c078d0cbc8f

Comment by Shane Harvey [ 16/Jan/19 ]

Yes we would like to use the failCommand fail point to test the behavior of find/getMore within a transaction without clashing with find/getMores from secondaries. For example, the "add transient label to connection errors" test can be extended to include getMore as well.

Comment by A. Jesse Jiryu Davis [ 15/Jan/19 ]

Another question about the backport: since we're not testing sharded transactions or retryable reads in 4.0.x, and the transactions tests we want for 4.0 replica sets only test writes, is it actually necessary to backport this ticket to 4.0? This ticket prevents a find or getMore from a secondary from spuriously triggering a failpoint, but the transactions tests mostly do writes. Remind me: the backport's needed because some of the transactions tests do involve find and getMore?

Thanks.

Comment by A. Jesse Jiryu Davis [ 15/Jan/19 ]

It's SERVER-35518.

Comment by Jeremy Mikola [ 15/Jan/19 ]

I think the primary use case for failCommand is related to transactions, so I think we can do without backporting the fail point support to mongos 4.0.x. IIRC, the only other place drivers use this fail point is in retryable writes' Network Error Tests – we don't run those tests on mongos due to the missing fail point today, so I think it's fine if we defer doing so until 4.2+.

Can you point me to the server ticket where failCommand support was added to mongos in 4.2? I'd like to create a spec ticket to instruct drivers to start running those retryable writes test on mongos 4.2+.

Comment by A. Jesse Jiryu Davis [ 15/Jan/19 ]

jmikola, it was pointed out to me that mongos doesn't support the failCommand fail point at all in 4.0.x. Do you think it's worth backporting the whole failCommand fail point to mongos for 4.0.x, or is it sufficient to backport the failCommand fix for replica sets? I've completed the replica set fix in a code review, the mongos backport would be much more effort.

Comment by A. Jesse Jiryu Davis [ 08/Jan/19 ]

Requesting backport to 4.0.

Comment by Jeremy Mikola [ 08/Jan/19 ]

jesse: Is there another ticket for the requested backport to 4.0.x? If not, should this issue have a label indicating the backport request? Just want to make sure it wasn't overlooked.

Comment by Githook User [ 12/Dec/18 ]

Author:

{'name': 'A. Jesse Jiryu Davis', 'email': 'jesse@mongodb.com', 'username': 'ajdavis'}

Message: SERVER-34943 Ignore internal commands with "failCommand"
Branch: master
https://github.com/mongodb/mongo/commit/597b4748fc36210c61cf4d6c086d364013df740a

Comment by A. Jesse Jiryu Davis [ 10/Dec/18 ]

Drivers team (shane.harvey and vincent.kam) request a backport to 4.0.x for more consistent testing.

Comment by A. Jesse Jiryu Davis [ 10/Dec/18 ]

Oh good point, I was stuck in mistaken thinking. Yes, calling "configureFailpoint" on the mongos will work for testing Retryable Reads with a sharded cluster.

Comment by Andy Schwerin [ 10/Dec/18 ]

Why not fail find commands in the router for testing sharded clusters? Is
it important that they fail at the shard?

On Mon, Dec 10, 2018, 7:35 AM A. Jesse Jiryu Davis (Jira) <jira@mongodb.org>

Comment by A. Jesse Jiryu Davis [ 10/Dec/18 ]

Actually, I think we need to reconsider.

The goal is to test Retryable Reads with the failCommand failpoint, and ignore replication finds and getMores. Ignoring "internalClient" works great for that in a multi-node replica set, but it doesn't work in a sharded cluster of multi-node replica sets. The problem is that mongos->mongod connections are also marked as "internal". So if we want failCommand to fail the driver's "find" command in a sharded cluster, we have only uncomfortable options:

  • Add an override to failCommand so it does fail internal commands, and then we must only test with sharded clusters of single-node replica sets, as well as unsharded multi-node sets
  • Abandon this ticket and only test Retryable Reads with single-node replica sets and sharded clusters of them
  • Don't test retryable reads with sharded clusters at all

I'd like to propose a new "replication: true" parameter again. If failCommand ignores replication connections, we can fulfill the goal of testing Retryable Reads, and we can test it the same with all topology types! This is much better.

It's true that Andy's $out scenario will also appear to be a "replication connection", but we don't plan to test that $out scenario with the failCommand failpoint, whereas we do plan to test "find" and "getMore" with failCommand. I think that "replication: true" solves the problem we want to solve.

Comment by A. Jesse Jiryu Davis [ 03/Dec/18 ]

That sounds simple: "failCommand ignores all internalClient connections". I'll proceed.

Comment by Tess Avitabile (Inactive) [ 03/Dec/18 ]

That's a good point. We have the internalClient field in isMaster, which is used to tag transport sessions with kInternalClient. We could use that here.

Comment by Andy Schwerin [ 03/Dec/18 ]

I'm not a fan of it. For one thing, we can't actually distinguish "replication" vs other internal network connections, because they share a pool on the "client node". For example, if you're doing aggregation with $out, a secondary might be sending $out writes to the primary of its own replica set as part of executing the user aggregation. That's not a "replication" connection, is it? What about when it's sending those writes to the primary of a different shard replica set? I think it will be difficult to manage.

There may already be another signal that the connection is from an "internal" operator, though. Would that be enough? I think there's something because fcv uses it to control the reported wire protocol version. I'll find you IRL to chat, and we can sum up here after.

Comment by A. Jesse Jiryu Davis [ 03/Dec/18 ]

I don't think we should do the solution where failCommand ignores commands from the __system user: that will require auth for testing (which is a new requirement) and if you forget to enable auth, some tests will fail some of the time for a very subtle reason. We'll have to remember that detail forever.

I propose adding an isMaster parameter, "replication: true", that's passed on the replication, initial-sync, and heartbeat connections. This allows replication connections to be tagged, and in shouldActivateFailCommandFailPoint we'll excuse replication connections from failCommand.

schwerin, we've discussed extensions to isMaster in the past, do have objections to passing "replication: true" with the initial isMaster on replication connections?

 

Comment by Vincent Kam (Inactive) [ 20/Nov/18 ]

Thanks tess.avitabile!

Comment by Tess Avitabile (Inactive) [ 19/Nov/18 ]

vincent.kam, thanks for the information. We will do our best to get this complete before driver testing starts.

Comment by Vincent Kam (Inactive) [ 19/Nov/18 ]

Thank you Tess!
>> 1. Can you please help us determine the priority/timing of this work? When would the drivers plan to start retryable reads testing? 

The spec proof of concept is currently due 2019-1-15, so ideally it'd be nice to have this ticket completed ~two weeks before then:  my apologies for the tight timing!

>> 2. Is it essential to test against multinode replica sets[?}

Theoretically, drivers may be able to limit testing to mongoses and standalones, but it would be highly desirable to test against multinode replica sets. 

>> 3. [I]s this the only way to test against multinode replica sets?

Without the requested enhancements to failpoints, we currently do not see any way for drivers to test against multinode replica sets. 

[edit] Dates.

cc: behackett, scott.lhommedieu

Comment by Tess Avitabile (Inactive) [ 16/Nov/18 ]

vincent.kam, yes, that's something we could consider. I will reopen this ticket and put it in Needs Scheduling for us to triage. Can you please help us determine the priority/timing of this work? When would the drivers plan to start retryable reads testing? Is it essential to test against multinode replica sets, and is this the only way to test against multinode replica sets?

Comment by Vincent Kam (Inactive) [ 16/Nov/18 ]

I believe this issue will affect retryable reads testing. For example, my current understanding is that when trying to test a retryable db.coll.find(...) on a replicaset, we would create a failPoint like so:

{
	"configureFailPoint" : "failCommand",
	"mode" : { times: 1 }
	"data" : {
		"commands" : [ "find" ],
		"closeConnection" : true
	}
}

However, the secondaries tailing the oplog could trigger this failpoint, creating a race condition while testing. Would it be possible to implement the solution proposed by Andy and Spencer in the comments above?

This issue isn't' a blocker, as drivers could theoretically only test against only standalones, single node replicasets, and mongoses, but it would definitely be nice to test against multinode replicasets as that's one of the primary use cases for retryable reads.

Comment by Tess Avitabile (Inactive) [ 22/May/18 ]

Instead we will implement the more general solution in SERVER-35004.

Comment by Shane Harvey [ 15/May/18 ]

I believe that drivers can work with either of those solutions. We are generally only interested in using failCommand to test retryable write commands both inside and outside transactions (including abortTransaction and commitTransaction).

Comment by Spencer Brody (Inactive) [ 15/May/18 ]

Another alternative, which would be made possible if SERVER-34960 gets completed, is that we instead of a whitelist of what commands are allowed, we provided a way to configure the failCommand failpoint with a blacklist of which specific commands it should fail. Would that meet your needs shane.harvey? If we can get SERVER-34960 in for 4.0, then I think that's probably the approach I'd lean towards. If we can't get SERVER-34960 for 4.0, then we'll probably go with the access-control based solution.

CC mira.carey@mongodb.com

Comment by Shane Harvey [ 14/May/18 ]

Sounds good to me. I think we can easily limit these driver tests to only run against replica sets with access control enabled.

Comment by Spencer Brody (Inactive) [ 14/May/18 ]

The simplest way to do this is probably as andy suggests and use the access control system. We could skip activating the failpoint on connections authenticated as the __system user. The downside to this approach is it wouldn't work when access control is disabled. Does that work for you shane.harvey?

Comment by Spencer Brody (Inactive) [ 14/May/18 ]

Oh wait, you might also have problems with the 'find' and 'getmore' commands used for replicating the oplog. Then we probably do indeed also need a way to configure the failpoint to only activate for commands not sent by cluster members.

Comment by Spencer Brody (Inactive) [ 14/May/18 ]

I think we should probably just add the 'replSetHeartbeat', 'replSetUpdatePosition', and 'replSetRequestVotes' commands to the command whitelist for now. Once SERVER-34960 is in, we should consider bringing back the functionality to specify which commands should be ignored, and the list specified should override the default whitelist. But I think the goal should be that the default whitelist is sufficient for most (all?) drivers testing, and we just use the ability to control which commands fail for testing server behavior.

Comment by Andy Schwerin [ 12/May/18 ]

Another option is to adjust the failpoint to only count for specific users.

Generated at Thu Feb 08 04:38:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.