Loading...

XML

Word

Printable

JSON

Type: Bug
Resolution: Won't Fix
Priority: Minor - P4
Fix Version/s: None
Affects Version/s: None
Component/s: bsondump
Labels:
None
Environment:
All

When bsondump is displaying UTF-8 strings, it does some escaping of certain characters (quotes, backslashes, tabs, some others) but it otherwise acts as if the UTF-8 is ready for display on the screen. If there is bad UTF-8, it is up to the display driver or terminal program or some other downstream component to decide how to show this bad data. This means that runs of bsondump of the same data on two different systems could display differently. It would be better if bsondump "fixed up" the bad UTF-8 by replacing bad sequences with the Unicode replacement character so that the output would be similar no matter what the display driver did with bad UTF-8.

bsondump can't really flag the data in the output stream without possibly messing up the JSON display, which might be intended for some post-processing filter. If passing bad UTF-8 to stdout as-is is a feature for some purposes, maybe we should add a --raw option to the command line to make that happen.

Somewhat related, it would be nice if the "--type debug" option displayed something more informative than "bad utf8 String!". The debug mode of bsondump only displays structure and sizes except for this UTF-8 checking: as long as it's checking and displaying an error, maybe it could show the offending bytes in hex and show their location in the string. This might make bsondump more useful as a debugging aid.

Assignee:: Gabriel Russell (Inactive)
Reporter:: Tad Marshall
Votes:: 0 Vote for this issue
Watchers:: 0 Start watching this issue

Created:: Mar 19 2012 04:23:25 AM UTC
Updated:: Nov 14 2017 03:38:46 AM UTC
Resolved:: Mar 03 2017 08:45:52 PM UTC

Details

Description

Attachments

Forms

Activity

People

Dates