[SERVER-86194] Improve MatchExpression $or->$in rewrite to handle multiple fields Created: 03/Feb/24  Updated: 06/Feb/24

Status: Backlog
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Major - P3
Reporter: Ben Shteinfeld Assignee: Militsa Sotirova
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Assigned Teams:
Query Optimization
Participants:

 Description   

Currently the MatchExpression $or->$in rewrite is only capable of collapsing a single field with multiple equality disjunctions into a $in. For example,

 

{ $or: [
     { a: 10 },
     { b: 11 },
     { c: 12 },
     { d: 13 },
     { a: 14 }, 
     { b: 15 },
     { c: 16 },
     { d: 17 },
     { a: 18 },
     { b: 19 }
  ]
} 

is optimized to

 

 

{
  $or: [
    { b: 11 },
    { b: 15 },
    { b: 19 },
    { c: 12 },
    { c: 16 },
    { d: 13 },
    { d: 17 },
    { a: { $in : [ 10, 14, 18 ] } }
  ]
}

 

This leads to an SBE plan with a large number of unnecessary comparisons. SBE is able to evaluate $in predicates more efficiently by using a hashset, and the interval builder is able to generate index bounds more easily on an $in list.

These expressions can be further reduce by allowing the $or->$in rewrite to handle multiple fields.

There are two high-value workloads (MatchExpressionWidePredicate and MatchExpressionWidePredicateWithDeepFieldpaths) which could be improved by this optimization.


Generated at Thu Feb 08 06:59:38 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.