[SERVER-7071] The sort() function should support multilanguage. Created: 19/Sep/12  Updated: 19/Sep/12  Resolved: 19/Sep/12

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 2.2.0
Fix Version/s: None

Type: New Feature Priority: Major - P3
Reporter: liugen Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified
Environment:

windows 7, language zh_CN.UTF-8


Participants:

 Description   

My local language is zh_CN.UTF-8.
When I do query using sort(),the return values may be wrong.
As you know, the sort() use compareElementValues() do strings' compare.
In compareElementValues():
...
case String:
/* todo: a utf sort order version one day... */

{ // we use memcmp as we allow zeros in UTF8 strings int lsz = l.valuestrsize(); int rsz = r.valuestrsize(); int common = std::min(lsz, rsz); int res = memcmp(l.valuestr(), r.valuestr(), common); if( res ) return res; // longer string is the greater one return lsz-rsz; }

...
This function use memcmp() do compare, so it can return right result with UTF8 string ,but not zh_CN.UTF-8.
May be we can use strcoll() instead of memcmp() here, the it can support multilanguage sort.
strcoll can compare two strings using locale language:

int strcoll ( const char * str1, const char * str2 );
Compare two strings using locale
Compares the C string str1 to the C string str2, both interpreted appropiately according to the LC_COLLATE category of the current locale.
This function starts comparing the first character of each string. If they are equal to each other continues with the following pair until the characters differ or until a null-character signaling the end of a string is reached.

For more informationg, see http://www.cplusplus.com/reference/clibrary/cstring/strcoll/ and http://www.cplusplus.com/reference/clibrary/clocale/setlocale/



 Comments   
Comment by liugen [ 19/Sep/12 ]

Thanks.

Comment by Eliot Horowitz (Inactive) [ 19/Sep/12 ]

See SERVER-1920

Generated at Thu Feb 08 03:13:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.