[SERVER-15969] Wired tiger aborts with unicode collection names Created: 05/Nov/14 Updated: 11/Jul/16 Resolved: 10/Nov/14 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Storage |
| Affects Version/s: | 2.8.0-rc0 |
| Fix Version/s: | 2.8.0-rc0 |
| Type: | Bug | Priority: | Blocker - P1 |
| Reporter: | Bernie Hackett | Assignee: | Mathias Stearn |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Found this running PyMongo's test suite. The test creates a collection like so:
mongod log:
|
| Comments |
| Comment by Bernie Hackett [ 10/Nov/14 ] | ||||||||||||||||||||||
|
Appears to be fixed. Built against git hash 0549652e913b4c39dc00ec10bd1895c085b27bf3. | ||||||||||||||||||||||
| Comment by Eliot Horowitz (Inactive) [ 10/Nov/14 ] | ||||||||||||||||||||||
|
Bernie - can you try with a build after 7am monday. | ||||||||||||||||||||||
| Comment by Michael Cahill [ 10/Nov/14 ] | ||||||||||||||||||||||
|
I've fixed a WiredTiger bug in UTF-8 handling that was probably causing the original error. The fix is in WiredTiger's develop branch, here: https://github.com/wiredtiger/wiredtiger/commit/c4f14ea06104009afe55e0e126e453e86b825475 Please let me know if there are any more problems after that change makes its way downstream. | ||||||||||||||||||||||
| Comment by Andy Schwerin [ 07/Nov/14 ] | ||||||||||||||||||||||
We officially require that field names and collection names be UTF-8 encoded non-control unicode code points, but all that we currently enforce is that they not contain embedded NUL bytes. | ||||||||||||||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||||||||||||||
|
> I assume this is something MongoDB should be handling? Yes, it would be terrific if MongoDB can guarantee UTF-8 to the WiredTiger API, but we can certainly talk it over, depending on the effort involved. | ||||||||||||||||||||||
| Comment by Bernie Hackett [ 07/Nov/14 ] | ||||||||||||||||||||||
Well, PyMongo will always send strings encoded UTF-8, since that's what the BSON specification requires. That being said, not all drivers validate strings, and not all drivers are maintained by MongoDB.
I assume this is something MongoDB should be handling? | ||||||||||||||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||||||||||||||
|
Update: this is documented in WiredTiger's discussion on configuration strings (http://source.wiredtiger.com/2.4.1/config_strings.html), quoted strings are interpreted as UTF-8 values. | ||||||||||||||||||||||
| Comment by Keith Bostic [ 07/Nov/14 ] | ||||||||||||||||||||||
|
I just opened WiredTiger issue #1353 to track this one (https://github.com/wiredtiger/wiredtiger/issues/1353), we'll get back to you on this. One question, what encoding formats do you support, that is, can we rely on seeing UTF-8 in the API, or are there other issues? | ||||||||||||||||||||||
| Comment by Bernie Hackett [ 07/Nov/14 ] | ||||||||||||||||||||||
|
It no longer aborts, just returns an obscure error to the client: python:
server log:
server info:
| ||||||||||||||||||||||
| Comment by Andy Schwerin [ 07/Nov/14 ] | ||||||||||||||||||||||
|
redbeard0531 believes that this issue is resolved for both collection names and index key field names. benety.goh or behackett, can you recheck at master? |