[SERVER-41142] Compound index on an array field is not being used correctly Created: 14/May/19  Updated: 06/Dec/22

Status: Backlog
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: 4.0.6
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Tom Grossman Assignee: Backlog - Query Optimization
Resolution: Unresolved Votes: 0
Labels: query-44-grooming
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File bad_scenario.js     File good_scenario.js    
Issue Links:
Related
is related to SERVER-24518 MERGE_SORT_STAGE can be used more agg... Backlog
is related to SERVER-19972 Passing empty array to $in should res... Closed
Assigned Teams:
Query Optimization
Participants:

 Description   

In the following schema: 

{
    "arr" : [ 
        "a", 
        "b", 
        "c"
    ],
    "field" : 1,
    "field_2" : 1
}

with the following index: 

{
    "arr" : 1,
    "field" : 1,
    "field_2" : 1
}

If I use this query:

db.coll
    .find({
        $or: [
            {arr: 'a'},
            {arr: []}
        ]
    })
    .sort({field: 1})
    .explain()

The inputStages are separated to 3 stages - arr: [a, a], arr: [[], []], arr: [undefined, undefined].
This is a good scenario since these input stages are followed by a "SORT_MERGE" stage

BUT, if I use the following query:

db.coll
    .find({
        $or: [
            {arr: 'a'},
            {arr: []}
        ],
        field: 1
    })
    .sort({field_2: 1})
    .explain()

There are only 2 input stages - arr: [a, a], arr: [[undefined, undefined], [[] , []]] .
This results that an additional stage needs to happen in order to FETCH the empty array docs and then it cannot use the SORT_MERGE stage.

In large collection, this causes the following error:
"Sort operation used more than the maximum 33554432 bytes of RAM. Add an index, or specify a smaller limit
Even though there is an index for this query.

I would expect the second scenario to perform like the first one - separating the input stages to 3 stages and to use the SORT_MERGE function.



 Comments   
Comment by Asya Kamsky [ 11/May/20 ]

It looks to me like MERGE_SORT is used if the query:

db.coll .find({$or: [ {arr: 'a'},{arr: []}], field: 1}).sort({field_2: 1})

is rewritten as

db.coll.find({arr: {$in: ['a', []]}, "field1":1 }).sort({field2: 1})

which can be used as a workaround till this is fixed.

Comment by Eric Sedor [ 24/May/19 ]

This may be considered a duplicate of SERVER-24518 because of the combined "arr": ["[undefined, undefined]", "[[] , []]"] index bounds that are generated for the compound case. But there are implications related to queries on the empty array. I am passing this on to the appropriate team to investigate further.

Thanks for your report, tomgrossman

Comment by Eric Sedor [ 14/May/19 ]

Thanks for the detailed report tomgrossman; we will investigate.

Generated at Thu Feb 08 04:56:55 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.