[SERVER-5462] Shell doesn't handle embedded nulls correctly Created: 30/Mar/12  Updated: 29/Aug/13  Resolved: 08/Mar/13

Status: Closed
Project: Core Server
Component/s: JavaScript
Affects Version/s: 2.0.3
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Glenn Maynard Assignee: Tad Marshall
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-6646 Strings with NUL bytes don't round-tr... Closed
Related
Operating System: ALL
Participants:

 Description   

> db.test.insert(

{'x': 'y\u0000a'}

)
> db.test.find()

{ "_id" : ObjectId("4f75db6d5b92cfb811975759"), "x" : "y" }

> db.test.find()[0].x.length
1

The shell sends the string to the backend correctly, and the backend stores the data correctly (it's retrieved properly in Python), but the data received is truncated. x.length is 1, so this isn't a cosmetic problem in the shell; the data is actually incorrect.



 Comments   
Comment by Glenn Maynard [ 04/Apr/12 ]

In general, Mongo should try to minimize the number of places where valid JSON/BSON can't be stored directly. Currently, all of the intentional restrictions (that I'm aware of) are key restrictions: object (document) keys can't contain ".", "\0", or begin with "$". Each of these restrictions gives me extra work: I need to ensure that my higher-level API doesn't require these, documents that they're not allowed where that leaks through to the API (which doesn't expose MongoDB per se) or else escape them; add tests for these exceptional cases, and so on. Having additional limitations like "document values can't contain nul" gives me more work to do and complicates my API (because now it has more special cases, too).

Also, as far as I'm aware, all of the intentional restrictions are limitations on keys, not values, which helps narrow it a bit. You only need to worry about special limitations in object keys; everything else is simply any valid JSON (BSON) object.

The closest to a use case I have is simply the desire for (valid) Unicode strings coming into my API to always round-trip back out again, even if they contain rare control characters like NUL.

Comment by Tad Marshall [ 04/Apr/12 ]

Hi Glenn,

Thanks for the report. I haven't even tried to reproduce it yet but your steps are clean and simple (and appreciated!)

We have a bit of confusion in our code over whether (and when) a NUL terminator actually terminates a string. NUL is a perfectly valid ASCII and Unicode character but it is also (as you know) widely used as a terminator for a string of characters and there is a lot of code that expects and demands this.

Can you give a use case for storing and retrieving embedded NULs in UTF-8 strings in MongoDB and/or a set of rules that you would like us to follow here? I'm not trying to be difficult, and if Python sees a different string length than JavaScript sees then there is clearly something that needs fixing, but I would love to see your thoughts on what "correct" behavior would be here.

Thanks for your help!

Tad

Generated at Thu Feb 08 03:08:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.