Uploaded image for project: 'Drivers'
  1. Drivers
  2. DRIVERS-2008

Default to lossy/replacement behavior when decoding UTF-8 in writeErrors

    • Type: Icon: Improvement Improvement
    • Resolution: Unresolved
    • Priority: Icon: Unknown Unknown
    • None
    • Component/s: BSON, Networking, Wire Protocol
    • None
    • Needed

      Summary

      Drivers should introduce workarounds for a longstanding server bug where invalid UTF-8 can be returned in the responses to write commands.

      Motivation

      There is a longstanding issue in the server where error messages can be truncated in the middle of a UTF-8 code point, resulting in the driver receiving invalid UTF-8 data (SERVER-24007). Users of a number of drivers have encountered this issue (see RUST-648, RUBY-2560, NODE-3627, CDRIVER-2453).

      Some drivers have implemented workarounds for this issue to avoid erroring in these scenarios; for example PYTHON-1090 and NODE-3670 switch the drivers to replace invalid Unicode characters rather than erroring when encountering them in write command responses. Some drivers may already automatically handle this situation gracefully.

      While this is a server bug, driver users have been encountering it for a while and will continue to do on older server versions even once it is fixed, so we should consider taking a similar approach to what Python and Node have done in all drivers.

      To be specific, when decoding writeErrors in a server response, drivers should not error if invalid UTF-8 is encountered and should use lossy/replacement behavior instead.

      Note that a couple of related DRIVERS tickets exist which cover slightly different subjects/cases where invalid UTF-8 can be encountered:

      • DRIVERS-1634 proposes drivers have uniform treatment when users provide data containing UTF-8
      • DRIVERS-1936 proposes drivers should have an option to disable UTF-8 validation

      Who is the affected end user?

      Users of any driver that does not have a workaround for this issue in place.

      How does this affect the end user?

      They get confusing errors about invalid UTF-8 rather than a more helpful error message from the server.

      How likely is it that this problem or use case will occur?

      Fairly likely. A number of different ways to encounter it are documented in related tickets.

      If the problem does occur, what are the consequences and how severe are they?

      Users get cryptic error messages they are unable to debug.

      Is this issue urgent?

      Nothing is on fire, but we should consider addressing it sooner rather than later.

      Is this ticket required by a downstream team?

      At this time, no. This could come up in Compass and mongosh, but the Node team has already released their workaround.

      Is this ticket only for tests?

      No, there is a functional change proposed as well.

            Assignee:
            Unassigned Unassigned
            Reporter:
            kaitlin.mahar@mongodb.com Kaitlin Mahar
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated: