Display string truncation not respecting UTF-8 character boundaries

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Gone away
    • Priority: Major - P3
    • None
    • Affects Version/s: 5.3.1
    • Component/s: None
    • None
    • ALL
    • Show
      See SERVER-11873
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      In our logs long strings are being truncated to 150 bytes irrespective of UTF-8 character boundaries. While this is not an issue for ASCII characters any multi-byte character that is on the truncation boundary gets cut at exactly 150 bytes and produces an invalid UTF-8 byte sequence contaminating our log file with an improper encoding.

      For example 'あ' is "\xE3\x81\x82" in UTF-8 bytes but if the last byte is cut it produces the invalid UTF-8 byte sequence of "\xE3\x81".

      The code causing this issue can be found here. It should be updated to be aware of character boundaries in its trimming logic.

            Assignee:
            Chris Kelly
            Reporter:
            Justin Casali
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: