[SERVER-76007] ArraySet with collation doesn't leverage already computed collation keys Created: 12/Apr/23 Updated: 30/Jan/24 |
|
| Status: | Backlog |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | None |
| Type: | Improvement | Priority: | Major - P3 |
| Reporter: | Ivan Fefer | Assignee: | Backlog - Query Execution |
| Resolution: | Unresolved | Votes: | 0 |
| Labels: | perf-tiger, perf-tiger-handoff, perf-tiger-non-q4 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Issue Links: |
|
||||||||||||||||||||
| Assigned Teams: |
Query Execution
|
||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Description |
|
When we add CollatorInterface* to ArraySet, two things happen: 2. When comparing strings, we will compare them according to the given CollatorInterface. Generating collation key is expensive operation AND comparing strings with collation is more expensive than just byte-wise compare. It leads to linear lookup being faster for up to 50 elements in hash set. We can try to speed it up by using generated collation keys for byte-wise compare instead of doing string compare with collation. |
| Comments |
| Comment by Irina Yatsenko (Inactive) [ 31/May/23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
There are microbenchmarks that show the problem as well (the table below contains subset of tests on linux-microbenchmarks-standalone-arm.2023-01 for 1 thread; "value" is ops_per_sec in 7.0 and "value_base" is ops_per_sec in 6.0). Notice, that with no collation ("WithSimpleCollation" tests) there is a considerable improvement, but when collation is used, the improvement is either much smaller or might even become a regression.
According to the profiles: 6.0 (icu-related symbols account for about 15K samples in the profile)
7.0 (icu-related symbols account for about 55K samples in the profile)
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Comment by Ivan Fefer [ 12/Apr/23 ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Results of benchmarking on my workstation with Collation {locale: "en_US"}.
|