Uploaded image for project: 'Core Server'
  1. Core Server
  2. SERVER-65926

Display string truncation not respecting UTF-8 character boundaries

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Gone away
    • Icon: Major - P3 Major - P3
    • None
    • 5.3.1
    • None
    • None

    Description

      In our logs long strings are being truncated to 150 bytes irrespective of UTF-8 character boundaries. While this is not an issue for ASCII characters any multi-byte character that is on the truncation boundary gets cut at exactly 150 bytes and produces an invalid UTF-8 byte sequence contaminating our log file with an improper encoding.

      For example 'あ' is "\xE3\x81\x82" in UTF-8 bytes but if the last byte is cut it produces the invalid UTF-8 byte sequence of "\xE3\x81".

      The code causing this issue can be found here. It should be updated to be aware of character boundaries in its trimming logic.

      Attachments

        Activity

          People

            chris.kelly@mongodb.com Chris Kelly
            jcasali@atlassian.com Justin Casali
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: