[SERVER-83712] Consider removing exact number of elements in $in from plan cache key Created: 29/Nov/23  Updated: 18/Jan/24

Status: Open
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ivan Fefer Assignee: Backlog - Query Execution
Resolution: Unresolved Votes: 2
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Related
related to SERVER-83777 Cap $in length in plan cache key with... Closed
is related to SERVER-82548 MongoDB 7.0.2 SBE selects a different... Closed
is related to SERVER-84728 Allow MatchExpression $or -> $in rewr... Investigating
Assigned Teams:
Query Execution
Sprint: QE 2023-12-11, QE 2023-12-25, QE 2024-01-08
Participants:
Story Points: 3

 Description   

In SBE plan cache key, we encode the number of elements in $in in it:

https://github.com/mongodb/mongo/blob/66d182d1e50b7bfa8866ca2c44917638a3758d79/src/mongo/db/query/canonical_query_encoder.cpp#L744

This is done because of explode for sort SBE plan will have the number of branches equal to the number of elements in $in.

However, this will lead to worse plan caching even for queries, where is no plan caching.

It is a fact that we have users that use $in's with arbitrary amount of elements from 1 to thousands.

Also, if a query have multiple $ins, it may lead to exponential growth.

One suggestion is instead of generating a unique plan for every $in size when exploding for sort, we can generate plans for all powers of 2 (or 4) and use the smallest plan that can fit all $in values.



 Comments   
Comment by David Storch [ 01/Dec/23 ]

The simpler fix ivan.fefer@mongodb.com referred to above is happening under related ticket SERVER-83777.

Comment by Ivan Fefer [ 30/Nov/23 ]

I decided to first implement simple fix to at least provide a finite bind to plan cache growth, before working on a more complex fix.

Comment by Ivan Fefer [ 30/Nov/23 ]

I decided to first implement simple fix to at least provide a finite bind to plan cache growth, before working on a more complex fix.

Comment by Ivan Fefer [ 30/Nov/23 ]

One problem is that even if we change explodeForSort code to pad the plan with empty IXSCANs, we can't do the same for $in size, as $in doesn't translate one-to-one to the number of exploded scans.

Example `db.find({a: {$in: [1, 2, 3]}, b: {$in: [4, 5, 6]]}}).sort({c: 1})` - will explode into 9 IXSCANs.

And $in size is not the only things that affect the number of IXSCANS - there are also $or.

So bucketing solution is not correct, but proposed simple fix should be correct at least.

Comment by Ivan Fefer [ 29/Nov/23 ]

Simple, but less efficient fix would be to add std::min(<$in size>, internalQueryMaxScansToExplode) when calculating plan cache key.

Generated at Thu Feb 08 06:52:56 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.