Details
Description
We need to tread with care to ensure that this is correct in all cases, but it appears that we can skip some planning work while building bounds for an $in.
If an $in has no regexes, then it consists of a list of sorted, deduped equalities which take into account the collation. The purpose of calling IndexBoundsBuilder::unionize() is to ensure that the IndexBounds have properly ordered interval lists, and that any overlapping intervals are merged. However, if we process the $in in sorted order, then the intervals should already be sorted. And if the $in has already be deduped, then there cannot be any overlapping intervals.
IndexBoundsBuilder::unionize() has been shown to be slow for queries which have a large list of $in values (say, several hundred thousand distinct values). Local testing shows that skipping this work could speed such queries up possibly by as much as 40%.
Note that this ticket is distinct from SERVER-34012, which identifies quadratic time complexity specifically for $or queries. In contrast, the $in code path for bounds building is not quadratic.
Attachments
Issue Links
- is related to
-
SERVER-35693 Parsing of $in takes quadratic time due to O(n^2) boost::flat_set constructor
-
- Closed
-
-
SERVER-30189 Reduce calls to allocator for large $in expressions
-
- Closed
-
-
SERVER-34012 Planner's logic for taking union of index bounds intervals is slow for large $or queries
-
- Closed
-