Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Fixed
Priority: Major - P3
Fix Version/s: 5.0.0-rc0
Affects Version/s: None
Component/s: Querying
Labels:
- asya

Backwards Compatibility:
Fully Compatible
Sprint:
Query Optimization 2021-03-22, Query Optimization 2021-04-05, Query Optimization 2021-04-19, Query Optimization 2021-05-03, Query Optimization 2021-05-17
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

The following script generates increasingly large $or queries, where each clause can use the _id index:

(function() {
    "use strict";

    let n = 0;
    let step = 1000;
    let iters = 20;

    function setup() {
        db.c.drop();
        db.c.insert({_id: 0})
    }

    function remove() {
        let terms = [];
        for (var i = 0; i < n; i++) {
            terms.push({_id: i})
        }
        db.c.find({$or: terms}).itcount();
    }

    for (let i = 0; i < iters; i++) {
        n += step;
        setup();
        let startTime = new Date();
        remove();
        let endTime = new Date();
        let elapsedTime = endTime - startTime;
        print(n + "\t" + elapsedTime);
    }
}());

As observed in ~~SERVER-33527~~, the run time of this query grows super-linearly with the number of $or clauses. Profiling indicates that much of the time is being spent in the planner computing the union of two sets of index bounds. This is implemented in IndexBoundsBuilder::unionize():

https://github.com/mongodb/mongo/blob/76cef6675d80ab57b98af11e6e47868a60e11cbb/src/mongo/db/query/index_bounds_builder.cpp#L727

This function is O(n log n) in the number of intervals being unionized. Since we must maintain the intervals in indexed order, we std::sort() the intervals prior to scanning the list for intervals that can be merged.

The bigger problem, however, is that we process one clause of the $or at a time, unionizing each into the bounds one-by-one. Therefore, we observe that the runtime is quadratic in n. Since n is related to the size of the query, not the size of the data, n is often quite small. However, when an $or query results in many non-overlapping intervals over the same index, IndexBoundsBuilder::unionize() is slow.

I verified that this is indeed the problem by applying the following patch. (Disclaimer: This patch is not a fix. It comments out logic necessary for other queries in order to prove where the time is going.) After applying the patch, the query runtimes are significantly reduced. The 20,0000 clause query, for example, goes from about 40 seconds to 400 milliseconds on my machine.

diff --git a/src/mongo/db/query/index_bounds_builder.cpp b/src/mongo/db/query/index_bounds_builder.cpp
index aa91b2c35a..c36c66ae19 100644
--- a/src/mongo/db/query/index_bounds_builder.cpp
+++ b/src/mongo/db/query/index_bounds_builder.cpp
@@ -266,7 +266,7 @@ void IndexBoundsBuilder::translateAndUnion(const MatchExpression* expr,
     oilOut->intervals.insert(oilOut->intervals.end(), arg.intervals.begin(), arg.intervals.end());

     // Union the appended intervals with the existing ones.
-    unionize(oilOut);
+    // unionize(oilOut);
 }

 bool typeMatch(const BSONObj& obj) {
@@ -954,6 +954,10 @@ void IndexBoundsBuilder::alignBounds(IndexBounds* bounds, const BSONObj& kp, int
         ++oilIdx;
     }

+    for (auto&& oil : bounds->fields) {
+        std::sort(oil.intervals.begin(), oil.intervals.end(), IntervalComparison);
+    }
+
     if (!bounds->isValidFor(kp, scanDir)) {
         log() << "INVALID BOUNDS: " << redact(bounds->toString()) << endl
               << "kp = " << redact(kp) << endl

is related to

SERVER-35693 Parsing of $in takes quadratic time due to O(n^2) boost::flat_set constructor

Closed

SERVER-33527 Plan with complex filter can hold lock for an extended time without yielding

Closed

SERVER-34029 mongod get stuck after upgrade

Closed

SERVER-62353 Ensure $or -> $in optimization parity

Closed

related to

SERVER-55509 {example: /regex/} and {example: {$eq: /regex/}} do not mean the same thing

Backlog

SERVER-55639 $in with regex has special meaning

Backlog

SERVER-66379 $or to $in conversion flawed

Closed

SERVER-72450 $or with regex to $in rewrite depends on terms order

Closed

SERVER-37176 IndexBoundsBuilder::unionize() can be skipped for $in

Closed

(4 related to)

Assignee:: Timour Katchaounov
Reporter:: David Storch
Participants:: David Storch, Githook User, Timour Katchaounov
Votes:: 0 Vote for this issue
Watchers:: 19 Start watching this issue

Created:: Mar 20 2018 08:14:21 PM UTC
Updated:: Mar 04 2024 02:02:10 PM UTC
Resolved:: May 11 2021 06:14:04 AM UTC
Confidence Status Last Update:: 15/Mar/21 6:39 AM

Details

Description

Attachments

Issue Links

Activity

People

Dates