[SERVER-4913] prevent process segv crash in javascript when a document has some bad string value Created: 08/Feb/12  Updated: 15/Aug/12  Resolved: 11/Aug/12

Status: Closed
Project: Core Server
Component/s: JavaScript
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Antoine Girbal Assignee: Antoine Girbal
Resolution: Duplicate Votes: 10
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Depends
Duplicate
duplicates SERVER-5842 Exceptions thrown in scripting/engine... Closed
Operating System: ALL
Participants:

 Description   

The following strings in documents crashed mongod, as seen from pymongo:
[u '\ufffd\udc34\ufffd\ufffd@xxxxxxxx.co.kr']

In general we've seen crashes whenever a string could not be decoded into UTF-16.



 Comments   
Comment by Tad Marshall [ 21/Feb/12 ]

Note that the other character in your description example (\ufffd) is itself the character usually used to represent a Unicode error. Probably, isolated surrogate halves (U+D800 through U+DBFF without matching U+DC00 through U+DFFF, or vice versa) should be converted to the the error character.

Comment by Pavel Dmitriev [ 21/Feb/12 ]

Hello,

Could you please provide us any ETA for this bug fix? It's very important for us.

Comment by Antoine Girbal [ 09/Feb/12 ]

The character triggering the issue here is \udc34
If you look up the unicode table it is a low-surrogate character which is supposed to be used in combination with a high-surrogate.
http://www.utf8-chartable.de/unicode-utf8-table.pl
Here it is not in use with a high-surrogate, which may trigger the crash in SM.

Comment by Antoine Girbal [ 08/Feb/12 ]

Did insert with pymongo.
Got following error when trying to execute eval with a 2.0.2 SM build:

>>> c.test.eval("doc = db.foo.findOne(); print('blah'); printjson(doc)")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/pymongo/database.py", line 688, in eval
result = self.command("$eval", code, args=args)
File "/Library/Python/2.7/site-packages/pymongo/database.py", line 353, in command
msg, allowable_errors)
File "/Library/Python/2.7/site-packages/pymongo/helpers.py", line 127, in _check_command_response
raise OperationFailure(ex_msg, response.get("assertionCode"))
pymongo.errors.OperationFailure: db assertion failure, assertion: 'Not proper UTF-16: 123,10,9,34,95,105,100,34,32,58,32,79,98,106,101,99,116,73,100,40,34,52,102,51,50,98,100,101,53,56,97,102,52,101,51,53,56,50,98,48,48,48,48,48,49,34,41,44,10,9,34,115,116,114,34,32,58,32,34,65533,56372,65533,65533,64,111,105,108,98,97,110,107,46,99,111,46,107,114,34,10,125', assertionCode: 13498

Using v8 build, no such error.
Printed on server side:
blah
{
"_id" : ObjectId("4f32bde58af4e3582b000001"),
"str" : "�???��@xxxxxxx.co.kr"
}

unlikely that SM is using a "limited" set of UTF16.
So most likely those characters are triggering an alignment bug either within SM or the SM wrapper.

Generated at Thu Feb 08 03:07:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.