[SERVER-26050] Unique key violation for index with a non-simple collation has unclear error message Created: 09/Sep/16 Updated: 30/Jan/24 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | Querying, Usability |
| Affects Version/s: | 3.3.12 |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Daniel Gottlieb (Inactive) | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 3 |
| Labels: | collation, query-44-grooming, storch | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||||||||||||||
| Sprint: | QE 2023-04-03, QE 2023-04-17 | ||||||||||||||||||||||||||||||||
| Participants: | |||||||||||||||||||||||||||||||||
| Case: | (copied to CRM) | ||||||||||||||||||||||||||||||||
| Description |
Gives an output of:
Whereas values not transformed by ICU provide a readable key:
|
| Comments |
| Comment by Kyle Suarez [ 18/Apr/23 ] | ||||
|
Sending to the backlog but adding it to the quick wins bucket. david.storch@mongodb.com thinks this might not be that quick. One interesting note is that we do want to eventually upgrade the ICU library and that will necessitate an index format change (as well as affecting how we approach this issue, as Ivan has mentioned earlier.) | ||||
| Comment by Ivan Fefer [ 13/Apr/23 ] | ||||
|
After cosmetic usability fix and some more investigation, I found two ways to address this issue: 1. Reverse-engineer icu::CollationKeys::writeSortKeyUpToQuaternary function to be able to extract collation elements from collation key and do a reverse-lookup to find corresponding UTF code points to build a string. Pros: Gives the best output. Cons: Tied to exact version of ICU library (but we already tied to it, because we store collation keys on disk). There are edge cases. For example, numerical ordering might contain just numbers. 2. Check for DuplicateKey error status after index write operations in mongo::IndexAccessMethod::insert/update and extend error message with a conflicting document or human-readable key. Pros: Easier to implement. Less dependency on third party code. Cons: If we put whole document into error message (which we do in some cases, for example: batch insert), error messages will be less usable and it is possible that conflicting key would be hidden due to compaction. If we decided to try to extract human-readable index key from a document, we will need to implement extra logic that might be complex, to pass “human-readable” flag to underlying key generators and call them again. | ||||
| Comment by Githook User [ 31/Mar/23 ] | ||||
|
Author: {'name': 'Ivan Fefer', 'email': 'ivan.fefer@mongodb.com', 'username': 'Fefer-Ivan'}Message: SERVER-26050 Add CollationKey(..) to hex-encoded index keys when using collation | ||||
| Comment by Ivan Fefer [ 30/Mar/23 ] | ||||
|
While we think about more complicated fix, I suggest adding "CollationKey()" around hex-encoded binary data to make output a little bit more clear.
| ||||
| Comment by Ivan Fefer [ 30/Mar/23 ] | ||||
|
Interesting side effect of this is that indexes with collation can't cover projections | ||||
| Comment by Ivan Fefer [ 29/Mar/23 ] | ||||
|
Under the hood indexes with collation, use icu::Collator::getCollationKey to get a sequence of bytes that can be compared just using regular comparison that should produce the same result as comparing original strings with the same collator. And we store this sequence in the index, instead of the original string. The ICU docs state that reverse operation is not supported. The other approach would be to catch the error and print the document or the update that caused the error to appear as well. Or something similar. | ||||
| Comment by Ivan Fefer [ 29/Mar/23 ] | ||||
|
The reason for this output is stated in this comment: https://github.com/mongodb/mongo/blob/master/src/mongo/db/storage/index_entry_comparison.cpp#L212 I am investigating how we can extract some human-readable user strings. | ||||
| Comment by Kasper N/A [ 17/Apr/21 ] | ||||
|
I found this issue and see that information is now hex encoded. However I struggle to successfully decode the information. Any pointers would be great. Here my StackOverflow question: | ||||
| Comment by David Storch [ 16/Apr/20 ] | ||||
|
Under | ||||
| Comment by Eric Tray [ 30/Aug/19 ] | ||||
|
Since this currently makes the error message unreadable (and unusable), is there a way to manually transform the duplicate key values back into a readable format? | ||||
| Comment by Bruce Lucas (Inactive) [ 18/Jun/18 ] | ||||
|
I adjusted the priority and issue type because this can result in non-UTF-8 characters in that can cause decoding the result in the client to fail. |