[SERVER-28010] Creating index on array of sub document fields causes unexpected bounds from index use in explain Created: 14/Feb/17  Updated: 15/Feb/17  Resolved: 15/Feb/17

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: 3.4.2
Fix Version/s: None

Type: Bug Priority: Minor - P4
Reporter: Richard Jackson [X] Assignee: David Storch
Resolution: Duplicate Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Duplicate
duplicates SERVER-15086 Allow for efficient range queries ove... Closed
Operating System: ALL
Steps To Reproduce:

If I insert the following document:

db.test.insertOne({ main_data : 100, sub_docs: [{ data : 22 },{ data : 859 },{ data: 151 }]}

And create an index on it using:

db.test.createIndex({"sub_docs.data" : 1})

When I perform a query to try and match the data using:

db.text.explain().find({ sub_docs: { $elemMatch: { data: { $gte: 110, $lt: 160 }}}})

The explain plan shows:

"indexBounds" : {
    "sub_docs.data" : [
        "[110.0, inf.0]"
    ]
}

Participants:

 Description   

If you create a multikey index on an array of sub document fields and then perform a range query using that index it will only use one bound of the index. One bound will be one of the values from that range query whilst the other will be either Inf.0 or -Inf.0.
The query is constructed using $elemMatch.

This is a problem because if a user queries for a range that matches index keys close to the middle of the index, then the database will scan half the documents before getting to the correct documents to return.



 Comments   
Comment by Richard Jackson [X] [ 15/Feb/17 ]

Hi David,

Thank you for the detailed information, that's really helped me out!

Kind regards,

Richard.

Comment by David Storch [ 14/Feb/17 ]

Hi Jackson147,

This is expected behavior unless this is a new index built on WiredTiger in 3.4. See SERVER-15086 for more details.

If you are on MMAPv1, you can correct this behavior by upgrading your cluster to use the WiredTiger storage engine on the 3.4.x series and rebuilding the multikey index. If you are already on WiredTiger 3.4.x, please rebuild the multikey index. The 3.4 server will collect metadata during the index build about which fields in the collection contain arrays; this knowledge, in turn, will allow the planner to generate a more efficient plan. Again, the technical details are provided in SERVER-15086.

Best,
Dave

Generated at Thu Feb 08 04:16:52 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.