Loading...

XML

Word

Printable

JSON

Type: Task
Resolution: Duplicate
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: JSON
Labels:
None

Confidence Status:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name:
None
Goal Link:
None

Teradata and MongoDB are in the process of releasing a T2Mongo connector. Teradata ran into the below issue and need help with diagnosis & a possible solution/workaround.

Description:

We are trying to import some control characters. The result is correct when it return as BasicDBObject. However, we need to cast it to java string, in case, we use BasicDBObject.toString method provided by Mongo Java driver. This method call com.mongodb.util.JSON.serialize() to serialize object. I’ve looked source code, and found the following method is called to serialize string type:

   static void string( StringBuilder a , String s ){
        a.append("\"");
        for(int i = 0; i < s.length(); ++i){
            char c = s.charAt(i);
            if (c == '\\')
                a.append("\\\\");
            else if(c == '"')
                a.append("\\\"");
            else if(c == '\n')
                a.append("\\n");
            else if(c == '\r')
                a.append("\\r");
            else if(c == '\t')
                a.append("\\t");
            else if(c == '\b')
                a.append("\\b");
            else if ( c < 32 )
                continue;
            else
                a.append(c);
        }
        a.append("\"");
}

From the line:

            else if ( c < 32 )
                continue;

it skips character 0 -31, and do not handle whitespace character in Unicode format, thus \u0001 - \u0019 will be ignored, thus, we cannot extract those Unicode character. I also realize that if I implement a simple JSONSerializer(cast BasicDBObject to Map and construct json document from key value pairs), the data type information will be lost, that is ,when export those data back to mongo side, mongo can only recognize the data as String type. Regard to this issue, could you help me to know is this a bug of Mongo java driver or it is intended to do, is there any workaround to keep those unicode characters when serialize BasicDBObject to String.

after I implement a simple JSONSerializer, I noticed DBS side unable to extract json document column if the column contain some unicode characters in format ‘\uxxx’ such as \u0001, for example,

Data in utf8 collection :

{ "_id" : ObjectId("54d49a22c6ee70b789a21d55"), "utf8string" : "\u0001\u0001\u0001" }

It is ok to do:

select * from Foreign Table(@BEGIN_PASS_THRU test.utf8.find()@END_PASS_THRU)@Mongo as T;

{"_id":"54d49a3dc6ee70b789a21d56","utf8string":" "}

But if I run
select MongoData from Foreign Table(@BEGIN_PASS_THRU test.utf8.find()@END_PASS_THRU)@Mongo as T;
or
select MongoData. utf8string from Foreign Table(@BEGIN_PASS_THRU test.utf8.find()@END_PASS_THRU)@Mongo as T;

it error out with:

- - Failure 7548 Invalid JSON data: Expected something like whitespace or ' {' or '}
    ' or '[' or ']' or ':' or ',' or '"' or '\' between '"' and '\0001' at character position 48. Make sure data was not truncated.
    Statement# 1, Info =0

is duplicated by

JAVA-1662 handle control character in Unicode format

Closed

Assignee:: Unassigned
Reporter:: Muthu Chinnasamy (Inactive)
Reviewers:: None
Votes:: 0 Vote for this issue
Watchers:: 1 Start watching this issue

Created:: Feb 06 2015 07:10:04 PM UTC
Updated:: Sep 11 2019 07:10:12 PM UTC
Resolved:: Oct 02 2016 10:08:59 PM UTC

Details

Description

Attachments

Issue Links

Activity

People

Dates