Uploaded image for project: 'Go Driver'
  1. Go Driver
  2. GODRIVER-1947

UnmarshalExtJSON doesn't handle escaped surrogate pairs

    • Type: Icon: Bug Bug
    • Resolution: Fixed
    • Priority: Icon: Major - P3 Major - P3
    • 1.5.2
    • Affects Version/s: 1.5.1
    • Component/s: JSON & ExtJSON

      RFC 8259 section 7 requires special handling of surrogate pairs like "\uD834\uDd1e":

      To escape an extended character that is not in the Basic Multilingual
      Plane, the character is represented as a 12-character sequence,
      encoding the UTF-16 surrogate pair. So, for example, a string
      containing only the G clef character (U+1D11E) may be represented as
      "\uD834\uDD1E".

      `UnmarshalExtJSON` does not properly decode surrogate pairs. Instead it converts each to a Unicode replacement character.

      Demo program:

      package main
      
      import (
      	"encoding/hex"
      	"fmt"
      
      	"go.mongodb.org/mongo-driver/bson"
      )
      
      func main() {
      	str := `{"a":"\uD834\uDd1e"}`
      	doc := bson.D{{"a", "\U0001D11E"}}
      
      	var buf bson.Raw
      	err := bson.UnmarshalExtJSON([]byte(str), true, &buf)
      	if err != nil {
      		panic(err)
      	}
      	fmt.Println("Unmarshaled from JSON: " + hex.EncodeToString(buf))
      
      	doc2, err := bson.Marshal(doc)
      	fmt.Println("Marshaled from bson.D: " + hex.EncodeToString(doc2))
      }
      

      Output:

      Unmarshaled from JSON: 1300000002610007000000efbfbdefbfbd0000
      Marshaled from bson.D: 1100000002610005000000f09d849e0000
      

      Treatment if ill-formed surrogate pairs (e.g. only one) is often implementation defined. You can find cases to consider in this corpus: https://github.com/nst/JSONTestSuite

            Assignee:
            matt.dale@mongodb.com Matt Dale
            Reporter:
            david.golden@mongodb.com David Golden
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: