[CSHARP-4842] String field starting with accented character can't be found by concatenated LINQ. Created: 13/Nov/23 Updated: 22/Nov/23 Resolved: 22/Nov/23 |
|
| Status: | Closed |
| Project: | C# Driver |
| Component/s: | Linq |
| Affects Version/s: | 2.22.0 |
| Fix Version/s: | None |
| Type: | Bug | Priority: | Minor - P4 |
| Reporter: | Takács Róbert | Assignee: | Oleksandr Poliakov |
| Resolution: | Works as Designed | Votes: | 0 |
| Labels: | accented, linq, query | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Documentation Changes Summary: | 1. What would you like to communicate to the user about this feature? |
| Description |
| Comments |
| Comment by Oleksandr Poliakov [ 22/Nov/23 ] | ||||||||||||||||||||||||
|
Yes, you are right! MongoDB $toLower and $toUpper has well defined behavior only for ASCII characters. Suggestion to support unicode was create quite a while ago, but never got many votes: https://jira.mongodb.org/browse/SERVER-32141 So there is no issues with .Net Driver, I'll close the ticket, but feel free to reopen it if you will need any further assistance.
Thanks, Oleksandr. | ||||||||||||||||||||||||
| Comment by Takács Róbert [ 21/Nov/23 ] | ||||||||||||||||||||||||
|
Hi oleksandr.poliakov@mongodb.com !
After further testing I think I was able to find the issue. Actually you were right and it is truly about the case-sensivity. I removed that part for the sake of simplicity which in this case hid the bug. Keeping our previous testing environment I was able to produce the following: Putting back the ".ToLower()" part inside the query resulted in a wrong behaviour.
the query results as:
And there is no Model returning. - Incorrect, this one should've changed the letters to lowercase inside the database and since my searching word also started with a lowercase letter the result should've been one Model.
results as:
There is no Model returning. - Correct.
Using the ".ToLower()" inside the query but not on the incoming words:
results as:
And there is one Model returning. - Incorrect, this one should've changed the letters to lowercase inside the database and since my searching word started with an uppercase letter the result should've been an empty list. Running the incorrect method but with only a part of the searching word - keeping the uppercase first letter:
results as:
And there is one Model returning. - Correct.
results as:
There is no Model returning. - Incorrect, should've found one Model.
Now change "ügyfÉl" to "üGyfél" in the database and run the following:
results as:
And there is one Model returning. - Correct.
Edit - The same happens by using ".ToUpper()". Edit2 - After reading about this in the documentation I've found out that $toLower is only applied to ASCII characters, so I guess this isn't an issue at all. Then one last question about this: Are you going to implement UTF-8 characters for this feature some day?
| ||||||||||||||||||||||||
| Comment by Oleksandr Poliakov [ 17/Nov/23 ] | ||||||||||||||||||||||||
|
Could you please try to run your web app locally so we can validate if the problem related to the hosting environment or not? Also could you please confirm if your test console application uses the same database/collections as the kubernetes web app you have mentioned? | ||||||||||||||||||||||||
| Comment by Takács Róbert [ 17/Nov/23 ] | ||||||||||||||||||||||||
|
Hi oleksandr.poliakov@mongodb.com ! Thank you for helping me out.
You're right, the queries are case sensitive and while in my production code I'm using them as non-sensitive, I removed those parts for the sake of simplicity. Unfortunately the problem is not hiding there. Also I tried to reproduce it in a new database / collection with new data as you requested and there is one clue which I found out recently. I totally forgot to mention that the whole system - MongoDB and services - are running in a dockerized kubernates platform. I'm still running some tests on the issue and I found out that if I'm connecting to this system through a simple C# Console.App:
The result will provide only 1 document, the correct one. Still when I'm calling through my app's APIs - either local or a web app in Azure - both are running in kubernates, the problem persists. When I'm calling either app with Postman, the problem also persists. With our devops team we'll try to find out if it's behind some character set settings issue, and will come back with some results, also I'll provide details about our docker / kubernates settings.
Again thank you for your help!
| ||||||||||||||||||||||||
| Comment by Oleksandr Poliakov [ 16/Nov/23 ] | ||||||||||||||||||||||||
|
I cannot reproduce the issue. I've tried to use provided code with the following model:
And initialize the collection with following data:
Everything works as expected. Results contains single object.
However your example for Expression LINQ generates case insensitive regex, this makes me think that it might be somehow related to the case sensitivity. Could you please try to search by both: "Ügyfél" and "ügyfél"? The next step to investigate the problem would be try to reproduce the problem on the new database/collection. Also it would be helpful if you could provide some test data to reproduce the issue.
Thanks, Oleksandr | ||||||||||||||||||||||||
| Comment by PM Bot [ 13/Nov/23 ] | ||||||||||||||||||||||||
|
Hi 89.t.robert@gmail.com, thank you for reporting this issue! The team will look into it and get back to you soon. |