[SERVER-80179] collation.locale caused the results to be unexpected Created: 17/Aug/23  Updated: 02/Nov/23  Resolved: 02/Nov/23

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: zhw zhw Assignee: Catalin Sumanaru
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Execution
Operating System: ALL
Sprint: QE 2023-11-13
Participants:

 Description   

version: 6.0.8

dataset:

{ "_id" : ObjectId("64dc9c1ecfae2bd697a5b725"), "_widget_1692174497931" : "视频号❤️", "_widget_1692174497932" : 1 }

,

{ "_id" : ObjectId("64dc9c2f77859fe282e9491a"), "_widget_1692174497931" : "视频号❤", "_widget_1692174497932" : 2 }

 
case1:
 
db.data.aggregate([
    {
        $match:

{             "_widget_1692174497931": "视频号❤"         }

    }
], {
    collation:

{         locale: 'zh'     }

})
 
result:

{ "_id" : ObjectId("64dc9c1ecfae2bd697a5b725"), "_widget_1692174497931" : "视频号❤️", "_widget_1692174497932" : 1 }

,
 

{ "_id" : ObjectId("64dc9c2f77859fe282e9491a"), "_widget_1692174497931" : "视频号❤", "_widget_1692174497932" : 2 }

 
case2:
db.data.aggregate([
    {
        $match:

{             "_widget_1692174497931": "视频号❤"         }

    }
])
 
result

{     "_id" : ObjectId("64dc9c2f77859fe282e9491a"),     "_widget_1692174497931" : "视频号❤",     "_widget_1692174497932" : 2 }

 
The two cases are only  collation.locale different。I'm guessing it might have something to do with Unicode
 
❤️  \u2764\ufe0f
❤  \u2764
Looking forward to your response!Thanks
   
 
 



 Comments   
Comment by Alison Rhea Thorne [ 26/Sep/23 ]

Hello zhanghongweiupup@163.com,

I was able to replicate the behavior that you've noted using the zh locale for collations, however, given that this also occurring on other locales as well this is likely intended behavior. This happens due to the two different heart emojis being considered two variants of the same basic base character. Something that I would like to note is that if required, you can also use a higher collation comparison strength if you find that you require higher granularity for your aggregation (as noted here: Collations). I found that with comparison strength set to 5 I was able to have the aggregation return the desired string. For now, we'll be moving this to the appropriate team for confirmation.

Comment by zhw zhw [ 21/Sep/23 ]

 

version: 6.0.8

I have the following data at data collection

{ "_id" : ObjectId("64dc9c1ecfae2bd697a5b725"), "_widget_1692174497931" : "视频号❤️", "_widget_1692174497932" : 1 },
{ "_id" : ObjectId("64dc9c2f77859fe282e9491a"), "_widget_1692174497931" : "视频号❤", "_widget_1692174497932" : 2 } 

db.data.aggregate([{
   $match: { "_widget_1692174497931": "视频号❤" }
}],{collation: { locale: 'zh' }})

I want to get 

 

{ "_id" : ObjectId("64dc9c2f77859fe282e9491a"), "_widget_1692174497931" : "视频号❤", "_widget_1692174497932" : 2 } 

but this aggregate return

 

{ "_id" : ObjectId("64dc9c1ecfae2bd697a5b725"), "_widget_1692174497931" : "视频号❤️", "_widget_1692174497932" : 1 }, { "_id" : ObjectId("64dc9c2f77859fe282e9491a"), "_widget_1692174497931" : "视频号❤", "_widget_1692174497932" : 2 } 

I'm guessing it might have something to do with Unicode

❤️  \u2764\ufe0f
❤  \u2764

I don't know if I expressed it clearly

 

 

Comment by Alison Rhea Thorne [ 05/Sep/23 ]

Hello,

Thank you for report.

Unfortunately, your report does not yet contain enough information for us to act upon. In order to begin diagnosing this problem we would like to request clarification on what behavior that you are observing vs your expected behavior. Additionally, we'd like to further request the steps required to reproduce the behavior that you have observed.

Generated at Thu Feb 08 06:42:53 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.