[SERVER-5462] Shell doesn't handle embedded nulls correctly Created: 30/Mar/12 Updated: 29/Aug/13 Resolved: 08/Mar/13 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | JavaScript |
| Affects Version/s: | 2.0.3 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Glenn Maynard | Assignee: | Tad Marshall |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||
| Operating System: | ALL | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
> db.test.insert( {'x': 'y\u0000a'}) > db.test.find()[0].x.length The shell sends the string to the backend correctly, and the backend stores the data correctly (it's retrieved properly in Python), but the data received is truncated. x.length is 1, so this isn't a cosmetic problem in the shell; the data is actually incorrect. |
| Comments |
| Comment by Glenn Maynard [ 04/Apr/12 ] |
|
In general, Mongo should try to minimize the number of places where valid JSON/BSON can't be stored directly. Currently, all of the intentional restrictions (that I'm aware of) are key restrictions: object (document) keys can't contain ".", "\0", or begin with "$". Each of these restrictions gives me extra work: I need to ensure that my higher-level API doesn't require these, documents that they're not allowed where that leaks through to the API (which doesn't expose MongoDB per se) or else escape them; add tests for these exceptional cases, and so on. Having additional limitations like "document values can't contain nul" gives me more work to do and complicates my API (because now it has more special cases, too). Also, as far as I'm aware, all of the intentional restrictions are limitations on keys, not values, which helps narrow it a bit. You only need to worry about special limitations in object keys; everything else is simply any valid JSON (BSON) object. The closest to a use case I have is simply the desire for (valid) Unicode strings coming into my API to always round-trip back out again, even if they contain rare control characters like NUL. |
| Comment by Tad Marshall [ 04/Apr/12 ] |
|
Hi Glenn, Thanks for the report. I haven't even tried to reproduce it yet but your steps are clean and simple (and appreciated!) We have a bit of confusion in our code over whether (and when) a NUL terminator actually terminates a string. NUL is a perfectly valid ASCII and Unicode character but it is also (as you know) widely used as a terminator for a string of characters and there is a lot of code that expects and demands this. Can you give a use case for storing and retrieving embedded NULs in UTF-8 strings in MongoDB and/or a set of rules that you would like us to follow here? I'm not trying to be difficult, and if Python sees a different string length than JavaScript sees then there is clearly something that needs fixing, but I would love to see your thoughts on what "correct" behavior would be here. Thanks for your help! Tad |