[SERVER-34012] Planner's logic for taking union of index bounds intervals is slow for large $or queries Created: 20/Mar/18  Updated: 29/Oct/23  Resolved: 11/May/21

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 5.0.0-rc0

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: Timour Katchaounov
Resolution: Fixed Votes: 0
Labels: asya
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-55639 $in with regex has special meaning Backlog
related to SERVER-55509 {example: /regex/} and {example: {$eq... Backlog
related to SERVER-66379 $or to $in conversion flawed Closed
related to SERVER-72450 $or with regex to $in rewrite depends... Closed
related to SERVER-37176 IndexBoundsBuilder::unionize() can be... Closed
is related to SERVER-35693 Parsing of $in takes quadratic time d... Closed
is related to SERVER-33527 Plan with complex filter can hold loc... Closed
is related to SERVER-34029 mongod get stuck after upgrade Closed
is related to SERVER-62353 Ensure $or -> $in optimization parity Closed
Backwards Compatibility: Fully Compatible
Sprint: Query Optimization 2021-03-22, Query Optimization 2021-04-05, Query Optimization 2021-04-19, Query Optimization 2021-05-03, Query Optimization 2021-05-17
Participants:
Case:

 Description   

The following script generates increasingly large $or queries, where each clause can use the _id index:

(function() {
    "use strict";
 
    let n = 0;
    let step = 1000;
    let iters = 20;
 
    function setup() {
        db.c.drop();
        db.c.insert({_id: 0})
    }
 
    function remove() {
        let terms = [];
        for (var i = 0; i < n; i++) {
            terms.push({_id: i})
        }
        db.c.find({$or: terms}).itcount();
    }
 
    for (let i = 0; i < iters; i++) {
        n += step;
        setup();
        let startTime = new Date();
        remove();
        let endTime = new Date();
        let elapsedTime = endTime - startTime;
        print(n + "\t" + elapsedTime);
    }
}());

As observed in SERVER-33527, the run time of this query grows super-linearly with the number of $or clauses. Profiling indicates that much of the time is being spent in the planner computing the union of two sets of index bounds. This is implemented in IndexBoundsBuilder::unionize():

https://github.com/mongodb/mongo/blob/76cef6675d80ab57b98af11e6e47868a60e11cbb/src/mongo/db/query/index_bounds_builder.cpp#L727

This function is O(n log n) in the number of intervals being unionized. Since we must maintain the intervals in indexed order, we std::sort() the intervals prior to scanning the list for intervals that can be merged.

The bigger problem, however, is that we process one clause of the $or at a time, unionizing each into the bounds one-by-one. Therefore, we observe that the runtime is quadratic in n. Since n is related to the size of the query, not the size of the data, n is often quite small. However, when an $or query results in many non-overlapping intervals over the same index, IndexBoundsBuilder::unionize() is slow.

I verified that this is indeed the problem by applying the following patch. (Disclaimer: This patch is not a fix. It comments out logic necessary for other queries in order to prove where the time is going.) After applying the patch, the query runtimes are significantly reduced. The 20,0000 clause query, for example, goes from about 40 seconds to 400 milliseconds on my machine.

diff --git a/src/mongo/db/query/index_bounds_builder.cpp b/src/mongo/db/query/index_bounds_builder.cpp
index aa91b2c35a..c36c66ae19 100644
--- a/src/mongo/db/query/index_bounds_builder.cpp
+++ b/src/mongo/db/query/index_bounds_builder.cpp
@@ -266,7 +266,7 @@ void IndexBoundsBuilder::translateAndUnion(const MatchExpression* expr,
     oilOut->intervals.insert(oilOut->intervals.end(), arg.intervals.begin(), arg.intervals.end());
 
     // Union the appended intervals with the existing ones.
-    unionize(oilOut);
+    // unionize(oilOut);
 }
 
 bool typeMatch(const BSONObj& obj) {
@@ -954,6 +954,10 @@ void IndexBoundsBuilder::alignBounds(IndexBounds* bounds, const BSONObj& kp, int
         ++oilIdx;
     }
 
+    for (auto&& oil : bounds->fields) {
+        std::sort(oil.intervals.begin(), oil.intervals.end(), IntervalComparison);
+    }
+
     if (!bounds->isValidFor(kp, scanDir)) {
         log() << "INVALID BOUNDS: " << redact(bounds->toString()) << endl
               << "kp = " << redact(kp) << endl



 Comments   
Comment by Githook User [ 10/May/21 ]

Author:

{'name': 'Timour Katchaounov', 'email': 'timour.katchaounov@mongodb.com'}

Message: SERVER-34012 Planner's logic for taking union of index bounds intervals is slow for large $or queries

Implement a rewrite of single-path ORs into IN-lists.
Branch: master
https://github.com/mongodb/mongo/commit/2f73b66d3cd004e0110bbee47151b75c7fc0791f

Generated at Thu Feb 08 04:35:17 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.