Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Done
Priority: Major - P3
Fix Version/s: 3.3.8
Affects Version/s: 2.4.11, 2.6.4
Component/s: Performance, Querying
Labels:
None

Backwards Compatibility:
Major Change
Sprint:
Query 16 (06/24/16)
Case:
Confidence Status:
None
Work Order:
3
CAR Domain/s:
None

Aha! Reference:
None
Tracking Level:
None
Risk Status:
None
Exec Notes:
None
Goal Name(s):
None
Goal Link:
None

Issue Status as of Jun 06, 2016

ISSUE SUMMARY
In versions of MongoDB 3.2 and earlier, the metadata associated with an index contains a single bit to indicate whether or not the index is multikey. There is no granular information to indicate which indexed fields cause the index to be multikey. This has implications for compound multikey indexes because it prevents the query planner from using the index as efficiently as theoretically possible (i.e. given perfect knowledge about the structure of the documents in the index).

Support for tracking which indexed fields cause the index to be multikey (aka "path-level multikey tracking") has been implemented in MongoDB 3.3.8. The query planner then uses this information to get tighter bounds on queries by assigning additional predicates to the index.

SUPPORTED STORAGE ENGINES
This feature is supported for the WiredTiger and InMemory storage engines only.

This feature is not supported for the MMAPv1 storage engine. MMAPv1 support for path-level multikey information is tracked in ~~SERVER-22727~~.

IMPACT ON DOWNGRADE
It is possible to downgrade from MongoDB 3.3.8 to MongoDB 3.2 only without first dropping all of the indexes that were created on MongoDB 3.3.8. However, it is only possible downgrade to MongoDB 3.2.7 or newer. Downgrading to MongoDB 3.2.7 will automatically delete the path-level multikey information from the index metadata. The indexes will retain only the index-level flag that indicates whether the index is multikey.

Downgrading from MongoDB 3.3.8 to a version of MongoDB older than 3.2.7 may produce the following assertion message if indexes contain path-level multikey information:

2016-05-31T13:18:11.090-0400 I -        [initandlisten] Assertion: 13111:wrong type for field (ns) 10 != 2

See ~~SERVER-23117~~ for more information on this downgrade requirement.

TECHNICAL DETAILS

Tighter bounds in 3.3.8+

Using the path-level multikey information we track for an index, we can search a bounded interval (thus having intersected the bounds) on a field in the compound multikey index that doesn't ever refer to an array value contain multiple elements. In the following example, the "a" field is a scalar and the "b" field is an array value containing multiple elements.

Index: {a: 1, b: 1}
Document: {a: 5, b: [1, 2, 3]}
⇒ Multikey paths: ["b"]
Query: {a: {$gte: 0, $lt: 10}}
⇒ Bounds on "a": [0, 10)
⇒ Bounds on "b": [MinKey, MaxKey]

~~SERVER-22401~~ has additional examples of what bounds can be generated for compound multikey indexes given a document structure and query.

Contrasted with 3.2 and earlier

In MongoDB 3.2 and earlier, the query planner doesn't have enough information to know that the "a" field is never an array value in some document in the collection (e.g. that the document {a: [1, 2, 3], b: 5} doesn't also exist in the collection). Without this additional information, the query planner must assume that the {a: {$gte: 0}} condition could apply to a different element than the {a: {$lt: 10}} condition. Thus, we can only search a left-bounded or right-bounded interval on the "a" field and apply the other condition as a subsequent filter:

{
    "stage" : "FETCH",
    "filter" : {
        "a" : {
            "$gte" : 0
        }
    },
    "inputStage" : {
        "stage" : "IXSCAN",
        "keyPattern" : {
            "a" : 1,
            "b" : 1
        },
        "indexName" : "a_1_b_1",
        "isMultiKey" : true,
        ...
        "indexBounds" : {
            "a" : [
                "[-inf.0, 10.0)"
            ],
            "b" : [
                "[MinKey, MaxKey]"
            ]
        }
    }
}

Explain output in 3.3.8+

The explain output for the COUNT_SCAN, DISTINCT_SCAN, and IXSCAN stages has been updated to include an additional "multiKeyPath" field for indexes that support this feature. The "multiKeyPath" field describes what prefixes of the indexed fields cause the index to be multikey:

{
    "stage" : "IXSCAN",
    "keyPattern" : {
        "a" : 1,
        "b" : 1
    },
    "indexName" : "a_1_b_1",
    "isMultiKey" : true,
    "multiKeyPaths" : {
        "a" : [ ],
        "b" : [
            "b"
        ]
    },
    ...
}

~~SERVER-23115~~ has additional information regarding the "mulitKeyPaths" field in the explain output.

Updated description

Suppose you have documents of the form

{a: [{b: 1}, {b: 2}, {b: 3}]}
{a: [{b: 4}, {b: 5}, {b: 6}]}

with the index {"a.b": 1}. With this schema, there is currently no way to constrain the index bounds with multiple predicates. Users who need to do this generally must change their schema. Instead we could provide new query language semantics to support this use case. See the comments below or related tickets ~~SERVER-8610~~, ~~SERVER-7959~~, and ~~SERVER-6050~~ for more details.

Original Description

It looks like there is unexpected slow performance on a range query when using $elemMatch. I first encountered it when using the aggregation framework and using the $match pipeline with $elemMatch, and have now backed it out into an example with $find.

The performance hit is quite large, 20x slower or so, when I believe, if anything, the query should actually run FASTER, not 20x slower.

I could try and troubleshoot more, but I don't totally understand the explain() output when doing ranges of dates. Sometimes explain() puts the indexBounds as 'true' in the explain(), and sometimes explain() put the indexBounds as 'ISODate("0NaN-NaN-NaNTNaN:NaN:NaNZ")', and I"m not 100% sure how to interpret that, other than I believe the equivalent is "no bound" (meaning the query is not bounding itself at the top or bottom of the range properly).

duplicates

SERVER-8610 $elemMatch (objects in array) not using index range correctly

Closed

is duplicated by

SERVER-7959 Potentially unexpected scans with compound indexes when some fields are multikey

Closed

SERVER-10436 wrong index ranges when using compound index on a list

Closed

SERVER-14618 Wrong index bounds when using "hint"

Closed

SERVER-15059 index on date range on embedded collection is not used correctly

Closed

SERVER-20302 Strange execution plan for range queries with $elemMatch

Closed

SERVER-23038 Why index dosn't working?

Closed

SERVER-23079 Why sub document cannot match index?

Closed

SERVER-24786 Compound multikey index does not compound indexbounds when nested in an object

Closed

SERVER-26978 Query Optimizer not using index correctly

Closed

SERVER-27386 Poor use of index

Closed

SERVER-28010 Creating index on array of sub document fields causes unexpected bounds from index use in explain

Closed

SERVER-4468 Keep track of multi-key fields in index

Closed

SERVER-6050 Consider allowing $elemMatch applied to non arrays

Closed

SERVER-14944 Use of Indexes for Query

Closed

is related to

SERVER-21178 Super slow query and increased memory usage on inefficient range queries

Closed

SERVER-22727 Extend the MMAP index catalog to store path-level multikeyness info

Closed

SERVER-7566 simpler way to figure out if an index is multikey

Backlog

SERVER-3173 Planner should use path-level multikey info to generate covered plans when possible

Closed

related to

SERVER-31444 Queries against multikey trailing fields of a compound 2d index are incorrectly covered, leading to incorrect results

Closed

SERVER-17064 indexBounds error

Closed

SERVER-18115 The planner can add an unnecessary in-memory sort stage for .min()/.max() queries

Closed

SERVER-21963 Index range scan inaccurate when compound index has an array field

Closed

SERVER-4463 Allow covered index use for non-array fields in multi-key index

Closed

SERVER-4468 Keep track of multi-key fields in index

Closed

(10 is duplicated by, 4 is related to, 6 related to)

Assignee:: Max Hirschhorn
Reporter:: Ben Rotz
Participants:: Asya Kamsky, Ben Rotz, David Storch, Derek Wilson, Hoyt Ren, Max Hirschhorn, Michael Barszcz
Votes:: 16 Vote for this issue
Watchers:: 53 Start watching this issue

Created:: Aug 28 2014 09:58:03 PM UTC
Updated:: Nov 10 2020 12:48:40 PM UTC
Resolved:: Jun 09 2016 09:24:57 PM UTC

Details

Description

Tighter bounds in 3.3.8+

Contrasted with 3.2 and earlier

Explain output in 3.3.8+

Updated description

Original Description

Attachments

Issue Links

Forms

Activity

People

Dates