[SERVER-1300] use memcmp, not strcmp for comparing BSON strings Created: 25/Jun/10  Updated: 12/Jul/16  Resolved: 01/Jul/11

Status: Closed
Project: Core Server
Component/s: Usability
Affects Version/s: None
Fix Version/s: 1.9.1

Type: Improvement Priority: Major - P3
Reporter: Mathias Stearn Assignee: Dwight Merriman
Resolution: Done Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
is duplicated by SERVER-3538 UTF8 null character \u0000 in the mid... Closed
Related
related to SERVER-3614 strings with embedded nulls cannot be... Closed
is related to SERVER-429 btree change (Sorting and other) mast... Closed
Participants:

 Description   

Don't want to stop at '\0'. Note will change index order in some cases.



 Comments   
Comment by Mario Gliewe [ 15/Aug/11 ]

i'm still using 1.4.0. might be obsolete in the meantime,.
i think these indexing options have been added later on? willl take a look into that...
(was just a kwick hack to make it working)

Comment by Mathias Stearn [ 15/Aug/11 ]

@Mario, Please don't do that as it will break indexing. See https://jira.mongodb.org/browse/SERVER-90 and https://jira.mongodb.org/browse/SERVER-1920 for the correct solution.

Comment by Mario Gliewe [ 15/Aug/11 ]

i guess this could cause problems later on when going to i18n?
in my local installation i use strcoll() in jsobj.h to enable locale sensitive sorting which seems to work well for my purposes..

Comment by Dwight Merriman [ 13/Aug/11 ]

some users store unicode via bson without translation?

i guess nothing bad would happen except regex won't work

Comment by Mathias Stearn [ 13/Aug/11 ]

Since NUL is a valid unicode code point it is possible for users to have it in their strings. I think most drivers allow it.

Comment by Chris Westin [ 12/Aug/11 ]

Can you say more about the rationale for this change? According to http://en.wikipedia.org/wiki/UTF-8, because the continuation bytes always start with a 10 bit pattern, they are never zero, so a null byte should never occur in the middle of a string. So what was the concern that led to the conclusion we should use memcmp() instead of strcmp()?

Comment by Dwight Merriman [ 03/Jun/11 ]

e104e01f8410cef79e29e766d66133458df85220

Comment by auto [ 09/May/11 ]

Author:

{u'login': u'dwight', u'name': u'dwight', u'email': u'dwight@10gen.com'}

Message: towards SERVER-1300 allow zeros in utf8 strings
Branch: master
https://github.com/mongodb/mongo/commit/4d72b66db321d11841bc885613d2948a8c56db6b

Comment by Dwight Merriman [ 20/Apr/11 ]

not quite sure how to do this seamlessly with backward compatibility (old indexes still working). i suppose it would behave exactly the same for strings without embedded nulls.

Generated at Thu Feb 08 02:56:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.