[SERVER-325] Correctly handle \u escapes using UTF16 surrogate pairs for chars outside of BMP Created: 30/Sep/09 Updated: 06/Dec/22 Resolved: 14/Mar/22 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Tools |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Mathias Stearn | Assignee: | Backlog - Storage Execution Team |
| Resolution: | Won't Do | Votes: | 0 |
| Labels: | neweng | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Assigned Teams: |
Storage Execution
|
| Participants: |
| Description |
|
http://en.wikipedia.org/wiki/UTF-16/UCS-2#Encoding_of_characters_outside_the_BMP From rfc 4627: |
| Comments |
| Comment by Mathias Stearn [ 02/May/16 ] |
|
The repro is in the description: "\uD834\uDD1E". That is (unfortunately) the correct way to encode U+1D11E in json. Do we parse that correctly as one 4-byte character or incorrectly as two 3-byte characters? Judging by this code, we still handle this incorrectly: |