[DOCS-447] Representation of the second example on bsonspec.org is incorrect Created: 26/Aug/12  Updated: 02/Nov/16  Resolved: 12/Nov/12

Status: Closed
Project: Documentation
Component/s: drivers
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Andrew Morrow (Inactive) Assignee: Sam Kleinman (Inactive)
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Participants:
Days since reply: 11 years, 14 weeks, 2 days ago

 Description   

The second example (see the bottom of the page) on http://bsonspec.org/#/specification shows an incorrect representation for the provided document (the one that starts

{"BSON": ... }

. Among the problems:

  • The terminating NULL for the cstring 'BSON' is shown as \x00&
  • Another byte is printed as 0x020 a bit later on
  • Yet another byte is printed as \x011
  • Yet another byte is printed as \x14@
  • Yet another byte is printed as \x102


 Comments   
Comment by Sam Kleinman (Inactive) [ 12/Nov/12 ]

BSONspec.org needs a lot of work, still, but I've corrected this error, and we can continue to iterate on this.

Comment by Andrew Morrow (Inactive) [ 08/Nov/12 ]

I think all that really needs to happen is to replace the ASCII'd characters with the hex values for the bytes that do not originate in a string context.

Comment by Sam Kleinman (Inactive) [ 07/Nov/12 ]

What are the next steps on this? scotthernandez, acm, epc, and gjmurakami

Comment by Scott Hernandez (Inactive) [ 30/Aug/12 ]

Maybe we should color code for string/hex/literal values in the
representation, or we could provide a hex only representation in addition
to the mixed one for clarity.

Comment by Andrew Morrow (Inactive) [ 30/Aug/12 ]

Hi Gary -

Given your final representation, I'm guessing you figured out what some of my reply would have been to your earlier comments: e.g. that the bytes that encode @ and & as characters should not be presented as such because the context in which they appear is not that of a string. Your last example also shows the correct number of individual float bytes, which is good, and I think it is more readable.

However, I think that maybe the best way is just to avoid doing any interpretation of the bytes at all. So the example would read:

\x31\x00\x00\x00\x04\x42\x53\x4f\x4e\x00\x26\x00\x00\x00\x02\x30\x00\x08\x00\x00\x00\x61\x77\x65\x73\x6f\x6d\x65\x00\x01\x31\x00\x33\x33\x33\x33\x33\x33\x14\x40\x10\x32\x00\xc2\x07\x00\x00\x00\x00

As long as you keep the page behavior where mousing over different pieces of the json highlight the equivalent portion of the BSON data and vice versa, then it should be clear that the byte sequence \x42\x53\x4f\x4e is 'BSON' because it will be highlighted on the page. Anyone who is interested in the binary encoding will understand this.

Comment by Gary Murakami [ 30/Aug/12 ]

Certainly if Andrew or anyone has a better idea, we welcome any and all suggestions for improvement. Would white space help? Jira will probably throw away my indentation in the following attempt at a clearer example amplification.

\x31 \x00 \x00 \x00 (int32)
\x04 (type 4 Array)
BSON \x00 (e_name)
\x26 \x00 \x00 \x00 (int32)
\x02 (type 2 UTF-8 string)
\x30 \x00 \x08 \x00 (int32)
0 \x00 (e_name)
awesome \x00 (string)
\x01 (type 1 Floating point)
1 \x00 (e_name)
\x33 \x33 \x33 \x33 \x33 \x33 \x14 \x40 (double)
\x10 (type 16 int32)
2 \x00 (e_name)
\xc2 \x07 \x00 \x00 (int32)
\x00 (end Array)
\x00 (end Document)

Comment by Gary Murakami [ 30/Aug/12 ]

It's hard to make these examples more understandable. I've seen other representations that are visually clearer but less standard, and then you have to explain the non-standard representations that aren't even generated by code.

Here's an alternative that's "standard."
> require 'uri'
=> true
1.9.3-p194 :003 > URI.encode(BSON.serialize(

{"BSON"=>["awesome", 5.05, 1986]}

).to_s)
=> "1%00%00%00%04BSON%00&%00%00%00%020%00%08%00%00%00awesome%00%011%00333333%14@%102%00%C2%07%00%00%00%00"

I think that most programmers are more facile at reading the \x00 hex escapes.

Comment by Gary Murakami [ 30/Aug/12 ]

Andrew and Sam, did you try it in a driver?

Here the log for ruby.
$ irb
1.9.3-p194 :001 > require 'bson'
=> true
1.9.3-p194 :002 > BSON.serialize(

{"BSON"=>["awesome", 5.05, 1986]}

).to_s
=> "1\x00\x00\x00\x04BSON\x00&\x00\x00\x00\x020\x00\b\x00\x00\x00awesome\x00\x011\x00333333\x14@\x102\x00\xC2\a\x00\x00\x00\x00"
Here's the string from the web page.
"\x31\x00\x00\x00\x04BSON\x00&\x00\x00\x00\x020\x00\x08\x00\x00\x00awesome\x00\x011\x00333333
\x14@\x102\x00\xc2\x07\x00\x00\x00\x00"

You can see that it matches up exactly given the following understanding.
ascii "1" == hex 31
escape "\b" == hex 08
hex C2 == hex c2
escape "\a" == hex 07
ascii "&" is literal ascii even if it looks odd to you here
ascii "@" is literal ascii even if it looks odd to you here
ascii "BSON" "awesome" "333333" are literal

So the representation is correct, and this ticket is invalid.

Comment by Andrew Morrow (Inactive) [ 26/Aug/12 ]

Also, unless I'm reading it wrong, you don't have enough bytes for the floating point value. There should be six \x33 bytes, not three, before the \x14\x40.

Comment by Andrew Morrow (Inactive) [ 26/Aug/12 ]

So, the '@' and '&' are clearly a bit strange. The 'triples' are because of the stringy array indices. I think you should either remove the interpretation of alnum bytes (and just print the hex for every byte), or print 'BSON', 'awesome' '1' '2', and '3' in a different font. Otherwise it is hard to see what is going on.

Generated at Thu Feb 08 07:38:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.