[SERVER-66015] Auto-parameterization works incorrectly for indexed regular expression predicates Created: 27/Apr/22  Updated: 29/Oct/23  Resolved: 03/May/22

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: 6.0.0-rc2, 6.1.0-rc0
Fix Version/s: 6.0.0-rc4, 6.1.0-rc0

Type: Bug Priority: Critical - P2
Reporter: David Storch Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
is depended on by SERVER-64315 Re-enable caching of single solution ... Closed
Related
is related to SERVER-64776 Modify plan cache key encoding scheme... Backlog
is related to SERVER-33511 Same Query Shape with different regex... Closed
is related to SERVER-33678 Make regex indexability a factor of q... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v6.0
Sprint: QE 2022-05-02, QE 2022-05-16
Participants:
Linked BF Score: 132

 Description   

Here's a simple repro script, inspired by jstests/core/index_bounds_pipe.js:

(function() {
"use strict";
 
const coll = db.coll;
coll.drop();
 
assert.commandWorked(coll.insert({_id: "a"}));
assert.commandWorked(coll.insert({_id: "b"}));
assert.commandWorked(coll.insert({_id: "foo"}));
 
// Run a query which results in an auto-parameterized index scan plan being added to the SBE plan
// cache.
assert.eq(coll.find({_id: /^a/}).itcount(), 1);
 
// This query incorrectly reuses the cached plan, and as a result returns 3 results instead of 2.
assert.eq(coll.find({_id: /^a|b/}).itcount(), 2);
}());

Note this this only triggers the problem if you also include the changes for SERVER-64315, since this repro depends on the caching of single solution plans.

I believe the problem is similar to SERVER-64776 – namely, the bounds tightness of the regex predicate is not correctly incorporated into the plan cache key. The first regex, /^a/, has exact bounds tightness, and therefore the predicate is "trimmed" and not included as an explicit filter in the execution plan. In contrast, /^a|b/ has inexact bounds and the regex must be reapplied to each index key. However, this query reuses a plan that has no filter stage. Consequently, the predicate does not get reapplied and the query returns extra results.



 Comments   
Comment by Githook User [ 03/May/22 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-66015 Distinguish simple and non-simple regexes in SBE plan cache key

(cherry picked from commit 2cee6cb397119b83fa45b53743e3bcea4106c5fc)
Branch: v6.0
https://github.com/mongodb/mongo/commit/6c2b8f53b411cb2bbbfdabe8618e67c7dffc8adf

Comment by Githook User [ 03/May/22 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-66015 Distinguish simple and non-simple regexes in SBE plan cache key
Branch: master
https://github.com/mongodb/mongo/commit/2cee6cb397119b83fa45b53743e3bcea4106c5fc

Generated at Thu Feb 08 06:04:15 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.