handle control character in Unicode format

XMLWordPrintableJSON

    • Type: Bug
    • Resolution: Won't Fix
    • Priority: Minor - P4
    • None
    • Affects Version/s: None
    • Component/s: JSON
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Hi ,

      I have data in such manner:

      { "_id" : ObjectId("54874f34062dfda18bcb47f5"), "a" : 2, "b" : 1, "c" : 1, "d" : "\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001", "e" :2}
      

      From Mongo SHELL:

      mongos> db.ascii.insert({ "_id" : ObjectId("54874f34062dfda18bcb47f5"), "a" : 2, "b" : 1, "c" : 1, "d" : "\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001", "e" :2})
      WriteResult({ "nInserted" : 1 })
      mongos>
      mongos> db.ascii.find()
      { "_id" : ObjectId("54874f34062dfda18bcb47f5"), "a" : 2, "b" : 1, "c" : 1, "d" : "\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001\u0001", "e" : 2 }
      

      From Java driver when I try this the output is not proper:

      DBCursor cursor = table.find();
       
      	while (cursor.hasNext()) {
      		DBObject tmp = cursor.next();
      		System.out.println(tmp);
      	}
      

      The output is :

      { "_id" : { "$oid" : "54874f34062dfda18bcb47f5"} , "a" : 2.0 , "b" : 1.0 , "c" : 1.0 , "d" : "" , "e" : 2.0}
      

      You can see no data with d .

      I find that :

      BasicDBObject.toString call com.mongodb.util.JSON.serialize() to serialize the object to json string. To
      serialize String type in Mongo, the following method com.mongodb.util.JSON.string is
      used by mongo java driver:

         static void string( StringBuilder a , String s ){
              a.append("\"");
              for(int i = 0; i < s.length(); ++i){
                  char c = s.charAt(i);
                  if (c == '\\')
                      a.append("\\\\");
                  else if(c == '"')
                      a.append("\\\"");
                  else if(c == '\n')
                      a.append("\\n");
                  else if(c == '\r')
                      a.append("\\r");
                  else if(c == '\t')
                      a.append("\\t");
                  else if(c == '\b')
                      a.append("\\b");
                  else if ( c < 32 )
                      continue;
                  else
                      a.append(c);
              }
              a.append("\"");
          }
      

      From the lines:

       else if ( c < 32 )
          continue;
      

      this method skip character 0-31, and do not handle control character in Unicode
      format,thus u0001-u0019 character will be ignored when serialize BasicDBObject to
      string.

      how to handle this kind of unicode data ?

            Assignee:
            Unassigned
            Reporter:
            sandip
            None
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: