-
Type: Improvement
-
Resolution: Unresolved
-
Priority: Major - P3
-
None
-
Affects Version/s: None
-
Component/s: Querying
-
Labels:None
-
Query Execution
The CollatorInterface provides two mechanisms for comparing strings in a collation-aware fashion: compare() and getComparisonKey(). The former takes two strings and returns the result of the comparison. The latter returns an array of bytes such that memcmp against another comparison key yields the same results as compare().
The ICU documentation (see Sortkeys vs Comparison here) notes that generating an ICU comparison key is many times more expensive than doing a direct comparison. Profiles captured from mongod's integration with ICU confirms this to be the case. However, memcmp is also cheaper than compare(). This means that comparison keys should be used when you expect to compare a string repeatedly (say, hundreds of times), whereas direct comparison should be used in other cases. For example, we generate and store comparison keys in indexes, since we want to be able to repeatedly make cheap comparisons against these keys.
When performing an in-memory SORT, we currently generate comparison keys for all of the strings to be sorted, and then sort them with memcmp. My experiments show that, especially for small in-memory sorts, it is faster to sort via direct comparison. It would probably take a very large in-memory sort to cross the threshold, such that the repeated calls to CollatorInterface::compare() for the average element exceed the cost of generating the comparison key for that element.
- related to
-
SERVER-26129 Investigate perf overhead of collation
- Closed