[CDRIVER-652] Number formatting and whitespace in bson_as_json Created: 14/May/15 Updated: 30/Sep/19 Resolved: 29/May/15 |
|
| Status: | Closed |
| Project: | C Driver |
| Component/s: | json, libbson |
| Affects Version/s: | None |
| Fix Version/s: | 1.2-beta0 |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Jeroen Ooms [X] | Assignee: | A. Jesse Jiryu Davis |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Description |
|
I am implementing a client version of `mongoexport` in the R driver, which uses bson_as_json to convert bson records to json lines which are then streamed into file or connection. It works, but the bson to json conversion is suboptimal.
There are at least two issues. First there is unnecessary whitespace, which is undesired. A bigger issue is the number formatting. It seems like libbson prints doubles with fixed digits which results in trailing zero's or loss of precision for small numbers. By comparison, the real `mongoexport` utility outputs this for the same data:
Ideally output from bson_as_json would be identical to mongoexport, but I understand yajl might have its limitations. |
| Comments |
| Comment by A. Jesse Jiryu Davis [ 11/Jul/16 ] | ||||||||||||
|
Thanks for the info! Right now, none of the drivers distinguish ints from floats when they export JSON. We may change our minds in the future. | ||||||||||||
| Comment by Jeroen Ooms [X] [ 11/Jul/16 ] | ||||||||||||
|
Here is an old issue but I found that yajl also exports doubles with at least one decimal to distingiush them from integers. It does so simply by first printing the number, and then adding `.0` if the number only consists of `-0123456789` characters: This seems more reliable than fmod and also prevents the problems above when doubles get printed in scientific notation. | ||||||||||||
| Comment by Githook User [ 11/Jan/16 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: | ||||||||||||
| Comment by Githook User [ 11/Jan/16 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Format with flexible precision after the decimal point, and for convenience | ||||||||||||
| Comment by Githook User [ 20/Oct/15 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: | ||||||||||||
| Comment by Githook User [ 20/Oct/15 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Format with flexible precision after the decimal point, and for convenience | ||||||||||||
| Comment by Githook User [ 07/Oct/15 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: | ||||||||||||
| Comment by Githook User [ 01/Oct/15 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: | ||||||||||||
| Comment by Githook User [ 01/Oct/15 ] | ||||||||||||
|
Author: {u'username': u'ajdavis', u'name': u'A. Jesse Jiryu Davis', u'email': u'jesse@mongodb.com'}Message: Format with flexible precision after the decimal point, and for convenience | ||||||||||||
| Comment by A. Jesse Jiryu Davis [ 29/May/15 ] | ||||||||||||
|
1.2.0: https://github.com/mongodb/libbson/commit/9eb14d12c8a3495e0c99f2fb6238167f050df559 1.1.7: https://github.com/mongodb/libbson/commit/2fd5fb266f6280855e80eea527e9b87007e98f1f | ||||||||||||
| Comment by Jeroen Ooms [X] [ 29/May/15 ] | ||||||||||||
|
I hope that means
| ||||||||||||
| Comment by A. Jesse Jiryu Davis [ 29/May/15 ] | ||||||||||||
|
I now agree with you, and additionally think it should not be a goal to distinguish ints and doubles. I'll do the simplest thing possible that conforms with the JSON spec. | ||||||||||||
| Comment by Jeroen Ooms [X] [ 28/May/15 ] | ||||||||||||
|
Actually I take that back, I don't think forcing a decimal notation for doubles is a good idea. It leads to really poor formatting for large numbers. For example with the current implementation, the numbers 10^25 up till 10^30 are printed with a lot of non-significant noise:
With scientific notation you only get to see the actual significant digits, which is a more accurate representation of the value:
I think it's better to stick with scientific notation for all numbers, which works for both very large and very small numbers, and only prints actual signal from the number. | ||||||||||||
| Comment by Jeroen Ooms [X] [ 28/May/15 ] | ||||||||||||
|
Looks good. I haven't tested this yet, but if you make yajl parse whole numbers into integers then you can probably roundtrip numbers without loss of type, which would be very nice. | ||||||||||||
| Comment by A. Jesse Jiryu Davis [ 28/May/15 ] | ||||||||||||
|
Thanks for the fix. Additionally I added, for convenience, something mongoexport doesn't do: libbson's bson_as_json formats BSON doubles like "1.0" and integers like "1". JSON doesn't distinguish between them (all numbers are floats in JSON) and mongoexport doesn't, either. Thoughts? | ||||||||||||
| Comment by Jeroen Ooms [X] [ 28/May/15 ] | ||||||||||||
|
FYI the related issue with integer formatting has been resolved in mongoexport: https://jira.mongodb.org/browse/TOOLS-741 | ||||||||||||
| Comment by Jeroen Ooms [X] [ 16/May/15 ] | ||||||||||||
|
Yes that makes sense. I don't mind too much about whitespace, it's just a bit of overhead but not as big of a problem as the number formatting. | ||||||||||||
| Comment by A. Jesse Jiryu Davis [ 16/May/15 ] | ||||||||||||
|
Seems wise. Rather than changing how whitespace is displayed – someone else's code might rely on the way it's displayed now – I'd prefer to fix the bug and stop there. In the future I may add APIs to override options on the JSON formatter, not just whitespace but also indentation. | ||||||||||||
| Comment by Jeroen Ooms [X] [ 15/May/15 ] | ||||||||||||
|
I was able to fix the number formatting problem: https://github.com/mongodb/libbson/pull/127. It is a simple fix that changes the number formatting for real numbers to the sensible default. Hope you can find a minute to review it. Taking out the whitespace seems quite easy as well but it requires a lot of small changes and is probably a bit more controversial, so I'll leave that alone for now. | ||||||||||||
| Comment by A. Jesse Jiryu Davis [ 14/May/15 ] | ||||||||||||
|
Thanks Jeroen, I'd like to add features to the JSON generator as well, but it's low-priority in the scheme of things. The integer formatting is certainly a problem that must be fixed eventually, however. | ||||||||||||
| Comment by Jeroen Ooms [X] [ 14/May/15 ] | ||||||||||||
|
There is actually a bug in mongoexport for integers, that is obviously not the desired behavior: https://jira.mongodb.org/browse/TOOLS-741 |