[SERVER-76007] ArraySet with collation doesn't leverage already computed collation keys Created: 12/Apr/23  Updated: 30/Jan/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ivan Fefer Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 0
Labels: perf-tiger, perf-tiger-handoff, perf-tiger-non-q4
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Issue split
Related
related to SERVER-84830 Investigate SBE performance for queri... Open
related to SERVER-76171 Add benchmark to measure sbe::value::... Closed
is related to SERVER-75752 Pass collator to ArraySet for $in exp... Closed
Assigned Teams:
Query Execution
Participants:

 Description   

When we add CollatorInterface* to ArraySet, two things happen:
1. To calculate hash of string, we will generate a collation key and hash it instead of a raw string. There is no obvious way around it, as it is the best way to get collation-aware hash of a string.

2. When comparing strings, we will compare them according to the given CollatorInterface.

Generating collation key is expensive operation AND comparing strings with collation is more expensive than just byte-wise compare.

It leads to linear lookup being faster for up to 50 elements in hash set.

We can try to speed it up by using generated collation keys for byte-wise compare instead of doing string compare with collation.



 Comments   
Comment by Irina Yatsenko (Inactive) [ 31/May/23 ]

There are microbenchmarks that show the problem as well (the table below contains subset of tests on linux-microbenchmarks-standalone-arm.2023-01 for 1 thread; "value" is ops_per_sec in 7.0 and "value_base" is ops_per_sec in 6.0). Notice, that with no collation ("WithSimpleCollation" tests) there is a considerable improvement, but when collation is used, the improvement is either much smaller or might even become a regression.

test value value_base percent_diff
Aggregation.StringUnindexedInPredWithNonSimpleCollationBigCollection 87.23 103.33 -20%
Aggregation.StringUnindexedInPredWithSimpleCollationBigCollection 173.44 119.25 33%
Aggregation.StringUnindexedLargeInPredWithNonSimpleCollationBigCollection 69.55 58.26 12%
Aggregation.StringUnindexedLargeInPredWithSimpleCollationBigCollection 135.90 88.49 42%
Queries.StringUnindexedInPredWithNonSimpleCollationBigCollection 88.86 91.62 -22%
Queries.StringUnindexedInPredWithSimpleCollationBigCollection 175.51 133.59 27%
Queries.StringUnindexedLargeInPredWithNonSimpleCollationBigCollection 73.63 58.53 10%
Queries.StringUnindexedLargeInPredWithSimpleCollationBigCollection 149.34 95.44 47%

 

According to the profiles:

6.0 (icu-related symbols account for about 15K samples in the profile)

22.15%--mongo::BSONElement::compareElements
        |          
         --17.25%--mongo::CollatorInterfaceICU::compare
                   |          
                   |--14.53%--icu::RuleBasedCollator::doCompare
                   |          |          
                   |          |--7.13%--icu::CollationFastLatin::compareUTF8
                   |          |          
                   |           --0.96%--icu::CollationData::isUnsafeBackward
                   |          
                    --0.74%--icu::RuleBasedCollator::compareUTF8

7.0 (icu-related symbols account for about 55K samples in the profile)

50.23%--mongo::CollatorInterfaceICU::getComparisonKey
        |          
        |--42.01%--icu::RuleBasedCollator::getCollationKey
        |          |          
        |           --39.93%--icu::RuleBasedCollator::writeSortKey
        |                     |          
        |                     |--30.17%--icu::CollationKeys::writeSortKeyUpToQuaternary
        |                     |          |          
        |                     |           --15.36%--icu::CollationIterator::nextCE
        |                     |                     |          
        |                     |                     |--6.62%--icu::FCDUTF16CollationIterator::handleNextCE32
        |                     |                     |          
        |                     |                      --4.40%--icu::CollationIterator::nextCEFromCE32
        |                     |                                |          
        |                     |                                 --3.15%--icu::CollationIterator::appendCEsFromCE32
        |                     |          
        |                     |--5.39%--icu::RuleBasedCollator::writeIdenticalLevel
        |                     |          |          
        |                     |          |--2.93%--u_writeIdenticalLevelRun
        |                     |          |          
        |                     |           --1.05%--icu::Normalizer2Impl::decompose
        |                     |          
        |                     |--0.62%--icu::CollationIterator::nextCE
        |                     |          
        |                      --0.57%--icu::CollationKeys::LevelCallback::needToWrite
        |          
        |--4.67%--icu::UnicodeString::fromUTF8
        |          |          
        |           --4.44%--icu::UnicodeString::setToUTF8
        |                     |          
        |                      --2.63%--u_strFromUTF8WithSub
        |          
         --0.93%--operator new[]

Comment by Ivan Fefer [ 12/Apr/23 ]

Results of benchmarking on my workstation with Collation {locale: "en_US"}.
Without any collation average latency on any test is ~50 ns.

 

String size Number of strings ArraySet Latency Linear Latency
5 5 222 ns 124 ns
5 10 214 ns 224 ns
       
10 10 302 ns 224 ns
10 50 300 ns 1025 ns
       
50 10 872 ns 228 ns
50 50 880 ns 1091 ns
       
100 10 1623 ns 232 ns
100 50 1655 ns 1091 ns
100 100 1703 ns 2055 ns

 

 

Generated at Thu Feb 08 06:31:36 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.