[SERVER-38210] Text search with $diacriticSensitive bug with some chartacter Created: 20/Nov/18  Updated: 27/Oct/23  Resolved: 26/Nov/18

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Text Search
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bình Vương Assignee: Danny Hatcher (Inactive)
Resolution: Works as Designed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Operating System: ALL
Participants:

 Description   

Hi, I'm from vietnam!

I was build some project using mongodb. I use text index version 3 and using $diacriticSensitive query but it's not work in 2 characters of vietnamese is

It's đ : d

Sample: bóng đá

 

with "$diacriticSensitive": false

i'm query with "bong da" and not result match

 



 Comments   
Comment by Danny Hatcher (Inactive) [ 26/Nov/18 ]

Hello,

As Mark has identified the explanation for why these two characters are treated differently, I will close this as working as designed.

Thank you,

Danny

Comment by Bình Vương [ 20/Nov/18 ]

Thanks Mark Benvenuto!

With same query and data in Mariadb Full Text Search or Elasticsearch asciifolding, it's worked.

If this report is not isssue! I will have to add a field remove diacritics for query not ussing diacritics.

 

Comment by Mark Benvenuto [ 20/Nov/18 ]

According wikipedia, the lower-case characters 'd' and 'đ' (Unicode 0111) are different characters in Vietnamese Alphabet.

The diacritics in-sensitive search is designed only to treat characters with and without diacritic marks the same. See http://www.user.uni-hannover.de/nhtcapri/combining-marks.html for examples of characters with various diacritic marks. This means that lower-case characters 'd' and 'đ' are always going to be considered distinct characters regardless of the diacritic sensitive search option.

Comment by Bình Vương [ 20/Nov/18 ]

query with bong đa -> it's work and has result

Generated at Thu Feb 08 04:48:16 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.