[SERVER-65926] Display string truncation not respecting UTF-8 character boundaries Created: 25/Apr/22 Updated: 27/Oct/23 Resolved: 18/May/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | 5.3.1 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Justin Casali | Assignee: | Chris Kelly |
| Resolution: | Gone away | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Steps To Reproduce: | See SERVER-11873 |
| Participants: |
| Description |
|
In our logs long strings are being truncated to 150 bytes irrespective of UTF-8 character boundaries. While this is not an issue for ASCII characters any multi-byte character that is on the truncation boundary gets cut at exactly 150 bytes and produces an invalid UTF-8 byte sequence contaminating our log file with an improper encoding. For example 'あ' is "\xE3\x81\x82" in UTF-8 bytes but if the last byte is cut it produces the invalid UTF-8 byte sequence of "\xE3\x81". The code causing this issue can be found here. It should be updated to be aware of character boundaries in its trimming logic. |
| Comments |
| Comment by Chris Kelly [ 18/May/22 ] | |
|
Hi Justin, Thanks for the clarification! This will help for future reference. Regards, | |
| Comment by Justin Casali [ 17/May/22 ] | |
|
Hi Christopher, Apologies for the late response, was out all last week. Looking back the truncation errors we were having are in version 4.2 which is before the new json structured logging was added in 4.4. The underlying code that was causing the issue is still present in the latest version although it looks like from your tests the truncation logic has switched to not displaying strings at all instead of partial chopped ones. Given that this issue does not appear in newer versions and we managed to implement a fix on our end in 4.2 I think we can close this bug report. Appreciate the help, | |
| Comment by Chris Kelly [ 17/May/22 ] | |
|
Hi Justin, We still need additional information to diagnose the problem. If this is still an issue for you, would you please send your mongod logs to the link provided above? Regards, | |
| Comment by Chris Kelly [ 06/May/22 ] | |
|
Hi Justin, I went ahead and tried reproducing your problem on a single mongod running 5.3.1 and did not observe this behavior with the reproduction you refer to in
The issue does, however, show itself on version 2.4.8 which that ticket refers to. I also tried decreasing the maxLogSizeKB to 1 and was able to keep adding characters to the string with no truncation observed, up until that limit where it would no longer display the string whatsoever. In order to look into this further, it'd be helpful to see your mongod log and the $dbpath/diagnostic.data directory (the contents are described here). I've created a secure upload portal for you to upload an example covering the time you see this. Files uploaded to this portal are hosted on Box, are visible only to MongoDB employees, and are routinely deleted after some time. Additionally, if there is any other specific information you can provide on reproducing your specific problem in the logs, or something I may have omitted, that would be helpful before we pass it on. Regards, |