[SERVER-33527] Plan with complex filter can hold lock for an extended time without yielding Created: 27/Feb/18  Updated: 06/Dec/22  Resolved: 23/Mar/18

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Bruce Lucas (Inactive) Assignee: Backlog - Query Team (Inactive)
Resolution: Won't Fix Votes: 1
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PDF File IndexedOrNonLinearGrowth.pdf    
Issue Links:
Depends
Related
related to SERVER-34012 Planner's logic for taking union of i... Closed
Assigned Teams:
Query
Operating System: ALL
Participants:
Case:

 Description   

The query execution subsystem only encounters yield points in between calls to PlanStage::work(). However, match expression evaluation typically happens within a single unit of work. For instance, the FETCH stage will do the following when work() is called:

  • Retrieve an intermediate result document from the child stage.
  • If the result is an index key, fetch the keyed document from the collection.
  • Evaluate the filter, if one exists.
  • Discard the result and return NEED_TIME when the intermediate result document does not pass the filter.
  • Return the document if it passes the filter.

Match expression evaluation for a single document is usually quite fast. However, if the filter is extremely large or complex, match expression evaluation could take significant time. Since match expression evaluation does not yield, this could cause intent locks to be held for an extended period.

Here is an example repro case using a very large $or:

n = 10000
 
function setup() {
    for (var i = 0; i < n; i++) {
        db.c.insert({_id: i})
    }
}
 
function remove() {
    terms = []
    for (var i = 0; i < n; i++) {
        terms.push({_id: i})
    }
    db.c.remove({$or: terms})
}

Query run times appear to be quadratic:

10000   20 s
20000   73 s
30000  167 s

A MODE_IX lock is held the entire time, which can block all queries if a command requiring an exclusive lock like dbStats or collStats is is executed.



 Comments   
Comment by Charlie Swanson [ 23/Mar/18 ]

See Dave's comments above. We believe the real problem is separate, and we don't plan to add support for yielding during evaluation of a MatchExpression on a single document.

Comment by David Storch [ 20/Mar/18 ]

I diagnosed the issue related to non-linear runtime growth for indexed $or queries. See related ticket SERVER-34012 for a description of the problem.

Comment by David Storch [ 20/Mar/18 ]

There are two distinct problems here. The first is that the query execution system only encounters yield points in between calls to PlanStage::work(), and MatchExpression evaluation always happens within a single work() call. Therefore, any filter which is expensive can result in the lock being held for the duration of the filter's evaluation. This is rarely a problem in practice, since most filters are quite cheap. However, giant $or's or other MatchExpressions for which the query text itself is quite large can take some time to execute.

The second problem is that we have identified super-linear growth in execution time for a large indexed $or. As asya wisely points out, a COLLSCAN plan with an $or filter displays linear growth in execution time as a function of the number of $or clauses. I think we can be quite confident that $or execution is O(n) in the number of clauses.

On the other hand, an indexed $or plan appears to have non-linear growth. Like Asya, I reproduced this using a single document, so that the number of $or clauses grows while the number of documents being evaluated remains constant. I used this script:

(function() {
    "use strict";
 
    let n = 0;
    let step = 1000;
    let iters = 20;
 
    function setup() {
        db.c.drop();
        db.c.insert({_id: 0})
    }
 
    function remove() {
        let terms = [];
        for (var i = 0; i < n; i++) {
            terms.push({_id: i})
        }
        db.c.remove({$or: terms})
    }
 
    for (let i = 0; i < iters; i++) {
        n += step;
        setup();
        let startTime = new Date();
        remove();
        let endTime = new Date();
        let elapsedTime = endTime - startTime;
        print(n + "\t" + elapsedTime);
    }
}());

See attached file IndexedOrNonLinearGrowth.pdf for a plot of the output.

I will clarify this ticket to be specifically about the fact that MatchExpression evaluation does not yield. In addition, I'm going to investigate the cause of the non-linear growth pattern for an indexed $or, and will file a separate ticket describing that issue.

Comment by Bruce Lucas (Inactive) [ 09/Mar/18 ]

It holds a write intent (w) lock, which won't block reads and writes, until some operation comes along that needs a strong lock (R or W). That will be blocked by the non-yielding intent lock, and it will in turn block other reads and writes, for the duration of the non-yielding $or query.

Comment by Asya Kamsky [ 09/Mar/18 ]

As it turns out the issue seems to be the "consumption" of the $or query in the matcher somewhere, not the actual deletes (which is why there are no yields as the actual delete hasn't started yet).

I reran the test using a collection with zero documents and collection with one document (which guarantees that the number of documents to actually examine is a non-issue), I also varied the test making it IXSCAN on _id or COLLSCAN on unindexed field:

// $or on _id
1000    1.729 sec
2000    3.195 sec
4000    6.523 sec
8000    13.255 sec
16000    38.289 sec
32000    120.44 sec
// $or on *unindexed* field
1000    0.834 sec
2000    1.544 sec
4000    3.489 sec
8000    7.769 sec
16000    16.112 sec
32000    31.762 sec
// $or on indexed field that is not _id
1000    2.236 sec
2000    4.477 sec
4000    7.384 sec
8000    15.31 sec
16000    41.752 sec
32000    69.247 sec

These numbers are representative of all the runs for $or clause. They seem pretty linear to me for unindexed field, and it makes sense that having number of clauses to evaluate go up will cause the time to execute the query to go up (linearly). However for indexed field it does seem to grow quadratically - but isn't that expected because we have that many subquery plans to evaluate?

Note that while "top" will show that write operation was running for however many milliseconds/seconds I see no evidence that it's holding a write lock of any sort as I was able to read and write from the collection while the long query was executing.

Comment by Bruce Lucas (Inactive) [ 09/Mar/18 ]

Took a few stack traces and the time all seems to be being spent in PlanStage::saveState and PlanStage::restoreState.

Comment by Bruce Lucas (Inactive) [ 09/Mar/18 ]

By "number of documents considered" do you mean the number of documents in the collection? In this repro at least that doesn't seem to matter - in the repro above if setup() inserts 100k documents and remove removes 10k the timing is the same as if setup() had only inserted 10k documents.

Comment by Andy Schwerin [ 09/Mar/18 ]

The reason the query runtime in the sample looks to be quadratic is because we're increasing both the number of documents considered and the number of terms in the $or in proportion, right bruce.lucas? I'm pretty sure that the or evaluation is linear in the number of terms, and query evaluation is definitely linear in number of documents, so when both are n we get n^2 runtime.

The bigger question I think is, are we yielding at the right frequency or not? I think for updates we stop yielding after the first matched document gets updated. Do we do that for multi-deletes, too?

Comment by Bruce Lucas (Inactive) [ 28/Feb/18 ]

The particular query used in the original repro can be rewritten using $in, and that turns out to be fast. However that's only possible in special cases; for example this query is slow and holds a lock for an extended time without yielding but can't be rewritten using $in:

function remove() {
    terms = []
    for (var i = 0; i < n; i++) {
        terms.push({_id: i, x: null})
    }
    db.c.remove({$or: terms})
}

Generated at Thu Feb 08 04:33:41 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.