[SERVER-6747] Confusing error message when mapReduce() encounters invalid UTF-16 string data in collection Created: 09/Aug/12  Updated: 02/Mar/17  Resolved: 02/Mar/17

Status: Closed
Project: Core Server
Component/s: MapReduce
Affects Version/s: 2.2.0-rc0
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: William Zola Assignee: Unassigned
Resolution: Done Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File test_utf16_mapreduce.py    
Issue Links:
Depends
Related
Operating System: ALL
Participants:

 Description   

If you insert invalid UTF-16 text into a string and save it in MongoDB, the database will happily store it, but if you try to run mapReduce() on it, you'll get one of the following two errors:

map reduce failed:{
"errmsg" : "exception: map invoke failed: JS Error: InternalError: buffer too small (anon):1",
"code" : 9014,
"ok" : 0
}

map reduce failed:{
"errmsg" : "exception: map invoke failed: JS Error: TypeError: bad surrogate character 0x61 (anon):1",
"code" : 9014,
"ok" : 0
}

Suggested fixes:

  • Change MongoDB to not accept broken UTF-16 surrogate pairs
  • Print a more informative error message when encountering this

Reproducable test case attached



 Comments   
Comment by Eric Milkie [ 02/Mar/17 ]

Inserting invalid UTF-16 text into a string no longer generates an error.

Comment by Tad Marshall [ 10/Aug/12 ]

The buffer-too-small error happens when the unpaired surrogate is the last character in the source buffer ... to be legal UTF-16, there would need to be a following UTF-16 code unit, and "the buffer is too small" to contain it. A bit of overloading of error codes in the SpiderMonkey code ...

Generated at Thu Feb 08 03:12:35 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.