[SERVER-50459] Include "source" field in error responses from mongos Created: 21/Aug/20  Updated: 08/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Task Priority: Major - P3
Reporter: Lamont Nelson Assignee: Backlog - Service Architecture
Resolution: Unresolved Votes: 3
Labels: sa-remove-fv-backlog-22
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
depends on SERVER-50549 Transform connection-related error co... Closed
Related
is related to DRIVERS-1571 Direct read/write retries to another ... Implementing
Assigned Teams:
Service Arch
Participants:
Case:

 Description   

When a mongos returns an error from a remote node there is no indication of where the error originated. This is requested by drivers, so that they can determine if the mongos is malfunctioning versus a mongod instance. This would also be helpful for debugging and administration purposes. Right now, when errors such as InterruptedAtShutdown are sent from a mongod node through mongos drivers will follow the sdam spec and disconnect from mongos when this isn't necessary.

This ticket suggests creating a "source" field (or similarly named) for error responses that encodes this information.



 Comments   
Comment by Mira Carey [ 27/Aug/20 ]

Bringing in another conversation I'd had with divjot.arora about a possible option 3, which may be more palatable for backport:

If:

  • You see an error headed to a client that would cause an SDAM change or connection pool effect
    • Beyond the top level, we should also look for write errors and write concern errors
  • You are not shutting down
    Then:
  • If that error is retryable, rewrite that error to be one of those listed in the Other Transient Errors section of the sdam spec. That will cause the underlying read or write to be retried, but without sdam or connection pool implications.
  • If that error is not retryable, rewrite that error to be something not mentioned in any of the sdam or retryability specs

And specifically for rewriting retrayble errors, I'd go out of my way to only encode extra information in the message string. The thing we'd be aiming for there is to make sure that a driver would have no reason to regard the error as anomalous and to ensure that we get normal retry behavior. Non-retryable errors we could be more flexible with, as we don't need the drivers to do anything beyond logging/reporting them as usual.

The guess being that if you ever do see an error that would cause an sdam or connection pool change and you are not shutting down, that you must have received one of those errors remotely. At least on a mongos (which doesn't have replication state changes).

Comment by Billy Donahue [ 27/Aug/20 ]

Lamont, that's kind of TBD depending on how we do with SERVER-50550, which proposes adding metadata to whatever basic response comes out of SERVER-50549.

Comment by Lamont Nelson [ 27/Aug/20 ]

I'm glad to see this was worked out. I was originally mentioning the hostname for the use case of debugging and thinking of 'status' as a document containing multiple fields, but I'm still curious about the trust model that we assume. For #2 above, will the original error information still be available as a sub-document or will this information only be encoded in the message string?

Comment by Billy Donahue [ 26/Aug/20 ]

The implementation of proposal (2) "Overwrite proxied errors" is SERVER-50549, which I'll be starting work on.

Generated at Thu Feb 08 05:22:43 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.