[JAVA-1642] Mongo Connector does not extract valid control character from 0x00 to 0x1F." issue against T2Mongo Connector Created: 06/Feb/15 Updated: 11/Sep/19 Resolved: 02/Oct/16 |
|
| Status: | Closed |
| Project: | Java Driver |
| Component/s: | JSON |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Task | Priority: | Major - P3 |
| Reporter: | Muthu Chinnasamy (Inactive) | Assignee: | Unassigned |
| Resolution: | Duplicate | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||
| Description |
|
Teradata and MongoDB are in the process of releasing a T2Mongo connector. Teradata ran into the below issue and need help with diagnosis & a possible solution/workaround. Description: We are trying to import some control characters. The result is correct when it return as BasicDBObject. However, we need to cast it to java string, in case, we use BasicDBObject.toString method provided by Mongo Java driver. This method call com.mongodb.util.JSON.serialize() to serialize object. I’ve looked source code, and found the following method is called to serialize string type:
From the line:
it skips character 0 -31, and do not handle whitespace character in Unicode format, thus \u0001 - \u0019 will be ignored, thus, we cannot extract those Unicode character. I also realize that if I implement a simple JSONSerializer(cast BasicDBObject to Map and construct json document from key value pairs), the data type information will be lost, that is ,when export those data back to mongo side, mongo can only recognize the data as String type. Regard to this issue, could you help me to know is this a bug of Mongo java driver or it is intended to do, is there any workaround to keep those unicode characters when serialize BasicDBObject to String. after I implement a simple JSONSerializer, I noticed DBS side unable to extract json document column if the column contain some unicode characters in format ‘\uxxx’ such as \u0001, for example, Data in utf8 collection : { "_id" : ObjectId("54d49a22c6ee70b789a21d55"), "utf8string" : "\u0001\u0001\u0001" }It is ok to do: select * from Foreign Table(@BEGIN_PASS_THRU test.utf8.find()@END_PASS_THRU)@Mongo as T; {"_id":"54d49a3dc6ee70b789a21d56","utf8string":" "}But if I run it error out with:
|
| Comments |
| Comment by Muthu Chinnasamy (Inactive) [ 07/Feb/15 ] |
|
Java driver version 2.12.4, also 2.13.0 has been reporter with the same problem |