XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Major - P3
Fix Version/s: None
Affects Version/s: None
Component/s: Querying
Labels:
- qopt-team

Assigned Teams:

Query Optimization
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

This can be used to choose a more efficient order of execution for predicates like $or and $elemMatch.

Original Title

The Query Parser reorders $elemMatch ahead of equality conditions when evaluating a $or set.

Original Description

I created a collection with 200 documents, all in this format:

  { _id: ObjectId,
    aphaField: true,
    giantArray: [array of 20,000 documents like {ae: intValue}]
    zuluField: true }

The values for giantArray.ae are random values, but the first 19,999 are all > 17.
The last one has value 11.

Then I ran find(qry).count() against the collection, where qry was each of these filters:
1. {aphaField: {$eq: true}}
2. {giantArray: {$elemMatch: {ae: {$eq: 11}}}}
3. {zuluField: {$eq: true}}

I ran each query 21 times. The first time was to warm the cache, and then I recorded how long the next 20 executions took. All queries used a COLLSCAN as the collection has no indexes beyond the default one on _id, and all scanned all 100 documents.

The equality matches on the boolean fields alone ({aphaField: {$eq: true}} and {zuluField: {$eq: true}}) executed 20 times in 8 milliseconds. The $elemMatch condition on the nested array field "giantArray.ae" alone was slower, with 20 executions taking 4,616 milliseconds.

This is not unexpected, as the singleton boolean fields are very fast to test, whereas all 20,000 array members had to be examined to find the one with value 11.

<br>

Next I joined the equality tests on the boolean fields and the $elemMatch condition within a $or, both as {$or: [X, Y]} and {$or: [Y, X]}, and ran them 21 times each too:

{$or: [\{aphaField: {$eq: true}}, \{zuluField: {$eq: true}}]}
{$or: [\{aphaField: {$eq: true}}, \{giantArray: {$elemMatch: {ae: {$eq: 11}}}}]}

{$or: [\{giantArray: {$elemMatch: {ae: {$eq: 11}}}}, \{aphaField: {$eq: true}}]}
{$or: [\{giantArray: {$elemMatch: {ae: {$eq: 11}}}}, \{zuluField: {$eq: true}}]}

{$or: [/\{zuluField: /{$eq: true/}/}, /\{aphaField: /{$eq: true/}/}]/}
{$or: [\{zuluField: {$eq: true}}, \{giantArray: {$elemMatch: {ae: {$eq: 11}}}}]}

When X and Y were simple equality tests on boolean fields (i.e. no $elemMatch), both {$or: [X, Y]} and {$or: [Y, X]} executed 20 times in 8 to 10 milliseconds. However, when either X or Y was the $elemMatch condition, the Query Planner parser reordered THAT condition to be FIRST in the evaluation of the $or. That resulted in 20 executions of any query involving $elemMatch taking between 4568 and 4915 milliseconds.

~~SERVER-45308~~) shows that the Query Planner parser reorders conditions alphabetically, but $elemMatch is clearly a special case, perhaps from a different phase in the parsing. The effect of putting the $elemMatch conditions first was poorer performance because conditions are evaluated in that order too.

Evaluating a condition against an array field will typically take longer than one on a simple field, because the array will generally have multiple values to be tested. If the condition on the boolean field had been evaluated first these queries would have completed in much less time.

Conclusions
===========
In ~~SERVER-45308~~) my conclusion was that no backend code change could help, as due to MongoDB's flexible schema the Query Parser cannot tell that {{

{"giantArray.ae": 11}

}} is a condition on an array field and not just a nested sub-field.

In contrast, when the query filter employs $elemMatch it should be obvious to the Query Parser that the field involved is an array of documents (or the end user would not have used that operator).

I suggest that the Query Parser be changed to order $elemMatch conditions LAST within an $or set. Overall I believe this will result in better performance.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

elemTest.js
Jan 06 2020 02:48:54 AM UTC
3 kB
William Byrne III
elemTest.out.compact
Jan 06 2020 02:48:54 AM UTC
2 kB
William Byrne III

is duplicated by

SERVER-52619 Performance Regression on 4.4 compared to 4.2 on Aggregate query

Closed

SERVER-45308 Alphabetical order of field names used in an $or clause drives evaluation order and thus affects performance

Closed

is related to

SERVER-37530 Provide a way to cause a well-defined order of evaluation for predicates

Backlog

Assignee:: [DO NOT USE] Backlog - Query Optimization
Reporter:: William Byrne III
Participants:: [DO NOT USE] Backlog - Query Optimization, Charlie Swanson, William Byrne III
Votes:: 1 Vote for this issue
Watchers:: 14 Start watching this issue

Created:: Jan 06 2020 02:44:21 AM UTC
Updated:: Dec 06 2022 02:39:04 AM UTC

Details

Description

Original Title

Original Description

Attachments

Attachments

Issue Links

Forms

Activity

People

Dates