[SERVER-44782] Using operator $in in find method. Prevents using index to sort result with limit. Created: 22/Nov/19  Updated: 27/Oct/23  Resolved: 13/Dec/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 4.2.1
Fix Version/s: None

Type: Bug Priority: Major - P3
Reporter: Alexander Dubrovin Assignee: Jacob Evans
Resolution: Works as Designed Votes: 0
Labels: qopt-team
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File Example1.txt     Text File Example2.txt     Text File Example3.txt     Text File Example4.txt    
Operating System: ALL
Sprint: Query 2019-12-30
Participants:

 Description   

I have a collection and compound index on it 

IX_checkin {checkin:1, nights:1, people_count:1, idx_price:1}

When I run a query with filter like this:

db.cat_claim
.explain("queryPlanner")
.find({ 
"checkin" : { "$in" : [ ISODate("2019-11-26T03:00:00.000+03:00"), ISODate("2019-11-27T03:00:00.000+03:00"), ISODate("2019-11-28T03:00:00.000+03:00"), ISODate("2019-11-29T03:00:00.000+03:00"), ISODate("2019-11-30T03:00:00.000+03:00"), ISODate("2019-12-01T03:00:00.000+03:00"), ISODate("2019-12-02T03:00:00.000+03:00"), ISODate("2019-12-03T03:00:00.000+03:00"), ISODate("2019-12-04T03:00:00.000+03:00"), ISODate("2019-12-05T03:00:00.000+03:00"), ISODate("2019-12-06T03:00:00.000+03:00"), ISODate("2019-12-07T03:00:00.000+03:00"), ISODate("2019-12-08T03:00:00.000+03:00"), ISODate("2019-12-09T03:00:00.000+03:00"), ISODate("2019-12-10T03:00:00.000+03:00"), ISODate("2019-12-11T03:00:00.000+03:00"), ISODate("2019-12-12T03:00:00.000+03:00"), ISODate("2019-12-13T03:00:00.000+03:00"), ISODate("2019-12-14T03:00:00.000+03:00"), ISODate("2019-12-15T03:00:00.000+03:00"), ISODate("2019-12-16T03:00:00.000+03:00"), ISODate("2019-12-17T03:00:00.000+03:00"), ISODate("2019-12-18T03:00:00.000+03:00"), ISODate("2019-12-19T03:00:00.000+03:00"), ISODate("2019-12-20T03:00:00.000+03:00"), ISODate("2019-12-21T03:00:00.000+03:00"), ISODate("2019-12-22T03:00:00.000+03:00"), ISODate("2019-12-23T03:00:00.000+03:00"), ISODate("2019-12-24T03:00:00.000+03:00"), ISODate("2019-12-25T03:00:00.000+03:00"), ISODate("2019-12-26T03:00:00.000+03:00") ] }, 
"nights" : { "$in" : [ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15 ] }, 
"people_count" : 2 })
.sort({"idx_price":1}) 
.limit(600)

Query planner shows SORT_KEY_GENERATOR and following SORT input stage instead of SORT_MERGE input stage. If I replace $in operators with array of $or conditions with the same conditions, query planner produces SORT_MERGE input stage.



 Comments   
Comment by Jacob Evans [ 13/Dec/19 ]

Hi elohim-meth@yandex.ru,

The limit on IXSCANs leading into a SORT_MERGE is imposed because too many of them can spoil performance. It isn't an exact science though so it is fortunately tunable at runtime.

You can issue

db.adminCommand({setParameter: 1, internalQueryMaxScansToExplode: 216})

I've confirmed that raising this value from the default of 200 makes the query you provided perform a SORT_MERGE.

Comment by Carl Champain (Inactive) [ 02/Dec/19 ]

Hi elohim-meth@yandex.ru,

Thanks for the additional information!
I'm passing this ticket along to the Query team for further investigation.

Kind regards,
Carl
 

Comment by Alexander Dubrovin [ 23/Nov/19 ]

Thanks for the reply! After a lot of digging and experimenting I understand that the case is much more complicated then I've thought.

Both $in and $or conditions produce an array of IX_scan queries followed by SORT_MERGE stage but to a certain limit. On my server this limit seems to be 200. The problem: besides conditions on index our queries almost always have additional filtering conditions and that seems to confuse optimizer.

Example1:

 

db.cat_claim.explain("queryPlanner").find({ 
"checkin" : { "$in" : [     ISODate("2019-12-26T03:00:00.000+03:00"),     ISODate("2019-12-28T03:00:00.000+03:00"), ISODate("2019-12-29T03:00:00.000+03:00"), ISODate("2019-12-30T03:00:00.000+03:00"), ISODate("2019-01-04T03:00:00.000+03:00"), ISODate("2019-01-05T03:00:00.000+03:00"), ISODate("2019-01-07T03:00:00.000+03:00"), ISODate("2019-01-09T03:00:00.000+03:00") ] }, 
"nights" : { "$in" : [ 7, 8, 9, 10, 11, 12, 13, 14, 15 ] }, 
"people_count" : {     "$in" : [2,3] }, 
"adult" : { "$gte" : 2 }, 
"packet_type" : 0, 
"hstop" : { "$exists" : false }, 
"for_delete" : { "$exists" : false } })
.sort({ idx_price:1 })
.limit(600)

The plan is good, Optimizer generates 144 IX_SCANs followed by SORT_MERGE followed by FILTER with TOP.

 Lets add one more people_count value.

Example2:

 

db.cat_claim.explain("queryPlanner").find({ 
"checkin" : { "$in" : [     ISODate("2019-12-26T03:00:00.000+03:00"),     ISODate("2019-12-28T03:00:00.000+03:00"), ISODate("2019-12-29T03:00:00.000+03:00"), ISODate("2019-12-30T03:00:00.000+03:00"), ISODate("2019-01-04T03:00:00.000+03:00"), ISODate("2019-01-05T03:00:00.000+03:00"), ISODate("2019-01-07T03:00:00.000+03:00"), ISODate("2019-01-09T03:00:00.000+03:00") ] }, 
"nights" : { "$in" : [ 7, 8, 9, 10, 11, 12, 13, 14, 15 ] }, 
"people_count" : { "$in" : [2,3,4] }, 
"adult" : { "$gte" : 2 }, 
"packet_type" : 0, 
"hstop" : { "$exists" : false }, 
"for_delete" : { "$exists" : false } })
.sort({ idx_price:1 })
.limit(600)

Now we have 216 combinations of values and optimizer decides to use one IX_SCAN with arrays of bound values. Does such stage prevent using the index to sort result on "idx_price"? It abandons SORT_MERGE and uses FETCH to FILTER ALL found rows which is slow if number is big (my case) or filter is complicated. After that comes the in memory SORT stage.

Now I modified query to use $or array of "checkin" and "nights" pairs.

Example3:

db.cat_claim.explain("queryPlanner").find({ "$or" : [ { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : 15 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 7 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 8 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 9 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 10 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 11 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 12 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 13 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 14 }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : 15 } ], "people_count" : { "$in" : [ 2,3 ] }, "adult" : { "$gte" : 2 }, "packet_type" : 0, "hstop" : { "$exists" : false }, "for_delete" : { "$exists" : false }).sort({ idx_price:1 }).limit(600)

Optimizer generates 72 IX_SCANs instead of 144, and each of them is scan with array of bound values for "people_count". This seems to prevent using the index to sort on "idx_price".

Example4:

db.cat_claim.explain("queryPlanner").find({ "$or" : [ { "checkin" : ISODate("2019-12-26T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2019-12-28T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2019-12-29T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2019-12-30T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2020-01-04T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2020-01-05T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2020-01-07T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} }, { "checkin" : ISODate("2020-01-09T03:00:00.000+03:00"), "nights" : {"$in":[7,8,9,10,11,12,13,14,15]} } ], "people_count" : { "$in" : [     2,3 ] }, "adult" : { "$gte" : 2 }, "packet_type" : 0, "hstop" : { "$exists" : false }, "for_delete" : { "$exists" : false }).sort({ idx_price:1 }).limit(600)

The same as example3. Only difference is the number of IX_SCANS.

If I remove additional filter 

"adult" : { "$gte" : 2 }, "packet_type" : 0, "hstop" : { "$exists" : false }, "for_delete" : { "$exists" : false }

from Example3 or Example4 optimizer generates plan identical to Example1.

I attached generated plans for each example.

 

 

 

Comment by Carl Champain (Inactive) [ 22/Nov/19 ]

Hi elohim-meth@yandex.ru,

Thanks for the report.
I've tried to re-create the $or query, but I'm not getting the SORT_MERGE input stage. To help us understand what is happening, can you please:

  1. Provide the $or query?
  2. Provide the explain outputs for both queries ($in and $or)?

Kind regards,
Carl

 

Generated at Thu Feb 08 05:06:57 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.