[SERVER-23790] in ShardRegistry's runCommand, make commands that fail with a non-retryable error return immediately Created: 18/Apr/16  Updated: 08/Jun/17  Resolved: 27/Apr/16

Status: Closed
Project: Core Server
Component/s: Sharding
Affects Version/s: 3.3.4
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Esha Maharishi (Inactive) Assignee: Spencer Brody (Inactive)
Resolution: Won't Fix Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Sprint: Sharding 13 (04/22/16), Sharding 14 (05/13/16)
Participants:

 Description   

Currently, if a command fails with a non-retriable error, we check for write concern errors and potentially return a write concern error status that overrides the command error status:

https://github.com/mongodb/mongo/blob/bdc06761206ac398af04f0a2eb482c4dca49bad8/src/mongo/s/client/shard_registry.cpp#L779-L786

It is non-trivial to untangle the way we handle the different types of errors, which can be due to request failure, command failure, or write concern failure, for a few reasons:

----- Issue 1 -----

If a write concern error is present in the command response, it is converted to a WriteConcernFailed error, regardless of what kind of write concern error it was:

https://github.com/mongodb/mongo/blob/bdc06761206ac398af04f0a2eb482c4dca49bad8/src/mongo/s/client/shard_registry.cpp#L135-L141

WriteConcernFailed is part of kAllRetriableErrors, but other types of write concern errors are not:

https://github.com/mongodb/mongo/blob/bdc06761206ac398af04f0a2eb482c4dca49bad8/src/mongo/s/client/shard_registry.cpp#L153-L170

---- Issue 2 ----

The jstests that check for write concern error behavior, such as

commands_that_write_accept_wc_configRS.js

expect the client to see a command error (ok: 0) on even non-WriteConcernFailed write concern errors, which currently works because:

1) the non-WriteConcernFailed write concern error is returned and converted into WriteConcernFailed
2) since WriteConcernFailed is retriable, the command is retried
3) the command now fails because it previously succeeded (for example, dropCollection fails with NamespaceNotFound), but with a non-retriable error
4) since the error is non-retriable, we check the write concern detail
5) the non-WriteConcernFailed write concern error is returned as an error status
6) the test succeeds since it sees a command error

It would be better for non-WriteConcernFailed write concern errors to be returned as part of the write concern detail, and for the test to check for them there instead of checking if the command failed.

---- Issue 3 ----

Higher up error handling code expects non-retriable errors to be returned within the command response (they call getStatusFromCommandResult() themselves), but retriable errors to be returned as an error status.

Either the higher up error handling code should be unified to handle retriable and non-retriable errors the same way, or the code in ShardRegistry's runCommand should return the command response on non-retriable errors but an error status on retriable errors (when you're out of retries).



 Comments   
Comment by Spencer Brody (Inactive) [ 27/Apr/16 ]

Rather than fixing ShardRegistry::runCommand's behavior, just going to remove all callers of it and switch to calling Shard::runCommand instead, which has the right behavior.

Generated at Thu Feb 08 04:04:29 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.