Uploaded image for project: 'Compass '
  1. Compass
  2. COMPASS-4944

Fail gracefully instead of "Invalid UTF-8 string in BSON document"

    • Type: Icon: Bug Bug
    • Resolution: Unresolved
    • Priority: Icon: Major - P3 Major - P3
    • 1.32.6
    • Affects Version/s: 1.27.1, 1.34.2
    • None
    • Environment:
      Windows for sure, but I think all.
    • 2
    • Iteration Fish, Iteration Grouper
    • Needed
    • Hide

      Not sure where this would go, but this is an infrequent issue that comes up. Sometimes, somehow (almost certainly due to a driver bug or maybe someone using the wire protocol directly?) some bad utf8 can make it into a document. This is because the server doesn't validate utf8 and relies on the drivers to do that.

      As explained in the comments on this ticket, the node driver supports a connection url flag for disabling utf8 validation. This is not just for getting around these situations where you have bad data in the database but also because utf8 validation carries a small performance penalty.

      So you can just stick that param in a connection string and mongosh/compass (or whatever uses a driver that supports it) should disable utf8 validation. Which gives you the slight performance increase and it means that you can then see the broken documents in their broken state.

      With this PR compass now also exposes this option in the Advanced Connection Options, Advanced tab. The URI Options' "Select key" dropdown now has a new option under Miscellaneous Configuration for "enableUtf8Validation". To you use it, select it and set the value false. (since it defaults to true).

      This kind of situation does happen from time to time and this workaround should probably be documented somewhere.

      Show
      Not sure where this would go, but this is an infrequent issue that comes up. Sometimes, somehow (almost certainly due to a driver bug or maybe someone using the wire protocol directly?) some bad utf8 can make it into a document. This is because the server doesn't validate utf8 and relies on the drivers to do that. As explained in the comments on this ticket, the node driver supports a connection url flag for disabling utf8 validation. This is not just for getting around these situations where you have bad data in the database but also because utf8 validation carries a small performance penalty. So you can just stick that param in a connection string and mongosh/compass (or whatever uses a driver that supports it) should disable utf8 validation. Which gives you the slight performance increase and it means that you can then see the broken documents in their broken state. With this PR compass now also exposes this option in the Advanced Connection Options, Advanced tab. The URI Options' "Select key" dropdown now has a new option under Miscellaneous Configuration for "enableUtf8Validation". To you use it, select it and set the value false. (since it defaults to true). This kind of situation does happen from time to time and this workaround should probably be documented somewhere.

      Problem Statement/Rationale

      Compass will not display any results that would include a document containing an invalid UTF-8 string, and in place of the results displays the error "Invalid UTF-8 string in BSON document". This is also true of exporting: Compass will not allow a set of documents to be exported if one of them contains an invalid UTF-8 string (provides the same error).

      This is in contrast to Compass v1.26.1 which did display/export these documents, but substituted the replacement character � for any invalid bytes.

      Steps to Reproduce

      1. Create a document in MongoDB that contains a string field with invalid UTF-8 bytes. (I do not know how to actually perform this step but it seems to be possible).

      3. View the document in Compass, and also attempt to export the collection that contains this document.

      Expected Results

      I expect the behavior to be the same as it is in v1.26.1. The document can be viewed in Compass, with invalid chars replaced by �. The document can be exported using the "Export Collection" tool.

      Actual Results

      The document is not viewable in Compass and displays the error "Invalid UTF-8 string in BSON document". Clicking "Export Collection" (even if not viewing the document at the time, but exporting the full collection) and saving to a file gives the same error: "Invalid UTF-8 string in BSON document".

      Additional Notes

      The errors occurred on both v1.27.1 and v1.28.1.

            Assignee:
            Unassigned Unassigned
            Reporter:
            jake@convictional.com Jake Strang
            Votes:
            2 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated: