Details
-
Bug
-
Resolution: Gone away
-
Major - P3
-
None
-
5.3.1
-
None
-
None
-
ALL
-
Description
In our logs long strings are being truncated to 150 bytes irrespective of UTF-8 character boundaries. While this is not an issue for ASCII characters any multi-byte character that is on the truncation boundary gets cut at exactly 150 bytes and produces an invalid UTF-8 byte sequence contaminating our log file with an improper encoding.
For example 'あ' is "\xE3\x81\x82" in UTF-8 bytes but if the last byte is cut it produces the invalid UTF-8 byte sequence of "\xE3\x81".
The code causing this issue can be found here. It should be updated to be aware of character boundaries in its trimming logic.