Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-1947

UnmarshalExtJSON doesn't handle escaped surrogate pairs

    XMLWordPrintableJSON

Details

    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Major - P3 Major - P3
    • 1.5.2
    • 1.5.1
    • JSON & ExtJSON

    Description

      RFC 8259 section 7 requires special handling of surrogate pairs like "\uD834\uDd1e":

      To escape an extended character that is not in the Basic Multilingual
      Plane, the character is represented as a 12-character sequence,
      encoding the UTF-16 surrogate pair. So, for example, a string
      containing only the G clef character (U+1D11E) may be represented as
      "\uD834\uDD1E".

      `UnmarshalExtJSON` does not properly decode surrogate pairs. Instead it converts each to a Unicode replacement character.

      Demo program:

      package main
       
      import (
      	"encoding/hex"
      	"fmt"
       
      	"go.mongodb.org/mongo-driver/bson"
      )
       
      func main() {
      	str := `{"a":"\uD834\uDd1e"}`
      	doc := bson.D{{"a", "\U0001D11E"}}
       
      	var buf bson.Raw
      	err := bson.UnmarshalExtJSON([]byte(str), true, &buf)
      	if err != nil {
      		panic(err)
      	}
      	fmt.Println("Unmarshaled from JSON: " + hex.EncodeToString(buf))
       
      	doc2, err := bson.Marshal(doc)
      	fmt.Println("Marshaled from bson.D: " + hex.EncodeToString(doc2))
      }
      

      Output:

      Unmarshaled from JSON: 1300000002610007000000efbfbdefbfbd0000
      Marshaled from bson.D: 1100000002610005000000f09d849e0000
      

      Treatment if ill-formed surrogate pairs (e.g. only one) is often implementation defined. You can find cases to consider in this corpus: https://github.com/nst/JSONTestSuite

      Attachments

        Activity

          People

            matt.dale@mongodb.com Matt Dale
            david.golden@mongodb.com David Golden
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: