[DRIVERS-2775] Investigate optionally supporting parsing errInfo for MultipleErrorsOccurred Created: 14/Nov/23  Updated: 22/Nov/23

Status: Needs Triage
Project: Drivers
Component/s: CRUD
Fix Version/s: None

Type: Improvement Priority: Unknown
Reporter: Preston Vasquez Assignee: Unassigned
Resolution: Unresolved Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
is depended on by GODRIVER-3048 HasErrorCode() should parse MultipleE... Blocked
Driver Changes: Not Needed

 Description   

Summary

Remove the requirement that drivers must not parse inside errInfo in the CRUD specifications, allowing drivers the option to gracefully handle the MultipleErrorsOccured error code.

Motivation

The CRUD specifications for writeError state that 

Drivers MUST NOT parse inside errInfo.

The rationale for that can be found in this comment:

I think it would be even riskier to attempt to parse errInfo and construct a helpful error message – both due to implementation complexity and needing to rely on a stable response format. Presumably there was a reason the server team did not choose to do so for SERVER-20547; however, the scope/design documents linked from DRIVERS-820 do not explain why. There is some mention of truncating errInfo to a generic string ("could not provide detailed error information as the error grew too large") if the detailed document would exceed some size limit.

However, there is a use case where the server could return a MultipleErrorsOccured error code: 65 or 40671. See here for the rationale on the latter. In these cases to get more information you can inspect errInfo to get the complete details of the underlying errors. For example,

{
  index: 0,
  code: 65,
  errInfo: {
    causedBy: [
      {
        index: 0,
        code: 1,
        errmsg: "Failing command via 'failCommand' failpoint"
      },
      {
        index: 0,
        code: 2,
        errmsg: "Failing command via 'failCommand' failpoint"
      }
    ]
  },
  [Symbol(errorLabels)]: Set(0) {}
}

This is likely safe to do since the errInfo in this case is typed by the MultipleErrorsOccuredInfo class, here: https://github.com/mongodb/mongo/blob/e1fc84129a3516fd2caf0ab262c17713cd3e592d/src/mongo/s/write_ops/batched_command_response.h#L203

Who is the affected end user?

This feature would benefit the TAR team from having to parse through the raw server response.

How does this affect the end user?

This improvement would allow the Go driver to add convenience to functions that check if a server error contains a multiplicity of error codes if the underlying code is 65 or 40671:

println(err.(mongo.CommandError).HasErrorCode(65)) // true
 
// Assuming the case in the motivation, 1 and 2 are also true
println(err.(mongo.CommandError).HasErrorCode(1)) // true
println(err.(mongo.CommandError).HasErrorCode(2)) // true

How likely is it that this problem or use case will occur?

Edge case

If the problem does occur, what are the consequences and how severe are they?

There are no consequences, it simply defers the work of inspecting errInfo the client.

Is this issue urgent?

No

Is this ticket required by a downstream team?

No

Is this ticket only for tests?

No

Acceptance Criteria

Change the line "Drivers MUST NOT parse inside errInfo." in the CRUD specifications to
"Drivers MAY parse inside errInfo to validate errors associated with the MultipleErrorsOccured error code".



 Comments   
Comment by Kaloian Manassiev [ 22/Nov/23 ]

Sorry for the delayed reply kaitlin.mahar@mongodb.com - I hadn't seen the quote.

I didn't have any particularly insightful context about the MultipleErrorsOccurred error when I did that change. My primary objective was to make the write commands all use IDL instead of having their own parsing (there were 2-3 parsing/serialisation implementations at the time).

My opinion is that MultipleErrorOccurred must not be something we expose to end users, in the spirit of invisible sharding and we should get rid of it externally.

Clients shouldn't know how many shards we targeted because there is not that much they can do. The individual errors don't matter so much to them either - all that the client/driver cares about is one or more of:

  • Did my write succeed fully
  • If it didn't succeed fully, some text explanation of why not so that a human can read it
  • If it didn't succeed fully, is it safe for me to just retry it or do I need to take some recovery action

The first two we get from a single error message and ok:false and the code and the last one is from a combination of the code and the error labels.

Generated at Thu Feb 08 08:26:23 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.