[SERVER-38210] Text search with $diacriticSensitive bug with some chartacter Created: 20/Nov/18 Updated: 27/Oct/23 Resolved: 26/Nov/18 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Text Search |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bình Vương | Assignee: | Danny Hatcher (Inactive) |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Operating System: | ALL |
| Participants: |
| Description |
|
Hi, I'm from vietnam! I was build some project using mongodb. I use text index version 3 and using $diacriticSensitive query but it's not work in 2 characters of vietnamese is It's đ : d Sample: bóng đá
with "$diacriticSensitive": false i'm query with "bong da" and not result match
|
| Comments |
| Comment by Danny Hatcher (Inactive) [ 26/Nov/18 ] |
|
Hello, As Mark has identified the explanation for why these two characters are treated differently, I will close this as working as designed. Thank you, Danny |
| Comment by Bình Vương [ 20/Nov/18 ] |
|
Thanks Mark Benvenuto! With same query and data in Mariadb Full Text Search or Elasticsearch asciifolding, it's worked. If this report is not isssue! I will have to add a field remove diacritics for query not ussing diacritics.
|
| Comment by Mark Benvenuto [ 20/Nov/18 ] |
|
According wikipedia, the lower-case characters 'd' and 'đ' (Unicode 0111) are different characters in Vietnamese Alphabet. The diacritics in-sensitive search is designed only to treat characters with and without diacritic marks the same. See http://www.user.uni-hannover.de/nhtcapri/combining-marks.html for examples of characters with various diacritic marks. This means that lower-case characters 'd' and 'đ' are always going to be considered distinct characters regardless of the diacritic sensitive search option. |
| Comment by Bình Vương [ 20/Nov/18 ] |
|
query with bong đa -> it's work and has result |