[SERVER-83602] $or -> $in MatchExpression rewrite should not generate $or directly nested in another $or Created: 27/Nov/23  Updated: 31/Jan/24  Resolved: 23/Jan/24

Status: Closed
Project: Core Server
Component/s: Query Planning
Affects Version/s: None
Fix Version/s: 7.3.0-rc0

Type: Improvement Priority: Major - P3
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Problem/Incident
Related
related to SERVER-84013 Incorrect results for index scan plan... Closed
is related to SERVER-83091 $or query can trigger an infinite loo... Closed
Backwards Compatibility: Fully Compatible
Sprint: QO 2024-02-05
Participants:
Linked BF Score: 135

 Description   

As part of MatchExpression::optimize(), we have logic to try to rewrite an $or of equalities over the same path to an $in. This is advantageous, because it helps the downstream optimization code produce better plans. Here's an example of the rewrite:

// Original match expression.
{$or: [{name: "Don"}, {name: "Alice"}]}
 
// This gets rewritten to the following.
{name: {$in: ["Alice", "Don"]}}

When not all of the clauses of the $or can get rewritten in this manner, the current implementation can output a match expression tree with an $or that is a direct child of another $or. Here's an example:

// Original match expression.
{$or: [{name: "Don"}, {name: "Alice"}, {age: 42}, {job: "Software Engineer"}]}
 
// This gets rewritten to the following.
{$or: [{name: {$in: ["Alice", "Don"]}}, {$or: [{age: 42}, {job: "Software Engineer"}]}]}

As you can see, one of the direct children of the outer $or is another $or. There is a separate rewrite which happens as part of MatchExpression::optimize() which attempts to flatten such nested $or nodes. However, the $or -> $in rewrite happens afterwords. In the master branch, the $or-$or is subsequently simplified by the new boolean simplification module enabled in SERVER-81630, but in older branches the $or-$or is never simplified.

I would argue that despite boolean simplification, we should modify the implementation of the $or -> $in rewrite to avoid constructing directly nested $or nodes. Continuing the example above, the output of the $or rewrite should be as follows:

{$or: [{name: {$in: ["Alice", "Don"]}}, {age: 42}, {job: "Software Engineer"}]}



 Comments   
Comment by Githook User [ 23/Jan/24 ]

Author:

{'name': 'David Storch', 'email': 'dstorch@users.noreply.github.com', 'username': 'dstorch'}

Message: SERVER-83602 Avoid creating directly nested $or during $or->$in rewrite (#18245)

GitOrigin-RevId: 4946f9aaa808fc341e7e3666de0c4808c5c3b019
Branch: master
https://github.com/mongodb/mongo/commit/79deb924d16157107ddd95df7d94252a970e7edc

Generated at Thu Feb 08 06:52:39 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.