[SERVER-48614] Plan cache key computation for wildcard indexes with partialIndexFilter is incorrect, leading to incorrect query results Created: 05/Jun/20  Updated: 29/Oct/23  Resolved: 15/Jun/20

Status: Closed
Project: Core Server
Component/s: Index Maintenance, Querying
Affects Version/s: 4.2.7
Fix Version/s: 4.4.0-rc10, 4.2.9, 4.7.0

Type: Bug Priority: Critical - P2
Reporter: Ralf Strobel Assignee: David Storch
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: File SERVER-48614.js    
Issue Links:
Backports
Related
related to SERVER-48700 Add fuzz testing to ensure that the p... Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2
Steps To Reproduce:

Run attached reproduction script.

Note that the script loops over the relevant steps until the error occurs, since the problem does not seem to be entirely deterministic but may have some sort of timing sensitivity.

Also note the rather specific setup steps. We could only reproduce the issue if the wildcard index is defined with a partialFilterExpression and when a second non-wildcard index is present. But we are not sure if either of these aspects are causally relevant.

Sprint: Query 2020-06-15, Query 2020-06-29
Participants:

 Description   

We had an issue where a query on a collection containing a wildcard index seemingly randomly returned zero results, even though matching objects existed. The issue first arose one days after introducing wildcard indexes, so a causal relation seemed likely.

The problem seems to arise if a first query successfully filters for a non-null value using the wildcard index...

filter: { value1: 1.0, value2: 1.0, value3: 1.0, deleted: null }
planSummary: IXSCAN { $_path: 1, value3: 1 }

And then a second nearly identical query filters for null on the same field...

filter: { value1: 1.0, value2: 1.0, value3: null, deleted: null }
planSummary: IXSCAN { $_path: 1, value3: 1 }

Seemingly, the query planner simply reuses the plan from the first query, not realizing that the wildcard index is sparse by definition.



 Comments   
Comment by Ralf Strobel [ 17/Jun/20 ]

david.storch Thank you very much! I guess that will be mid July then, since 4.2.8 has just been released.

Comment by David Storch [ 17/Jun/20 ]

ralf.strobel I've backported the fix to the 4.4 and 4.2 branches. It will be included as part of the forthcoming 4.2.9 release.

Comment by Githook User [ 16/Jun/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes

Previously, discrimination based on the partial filter
expression was done for all paths included in the wildcard
projection. This could lead to a situation where two queries
were erroneously assigned the same plan cache key.

The fix is to ensure that for wildcard indexes, partial
index discriminators are instead registered only for those
paths mentioned in the partial filter expression. Unlike
other kinds of wildcard index discriminators (e.g. handling
concerns of null equality or collation), the paths in the
partial filter expression are known a priori. Therefore,
discrimination based on the partial filter can be done in
the same way for wildcard and non-wildcard indexes.

(cherry picked from commit 57edf434219c9659354f01fb6bf4f4e6c0370bc4)
Branch: v4.2
https://github.com/mongodb/mongo/commit/3d7175997102253e0bf2640ec95657371dd65041

Comment by Githook User [ 16/Jun/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes

Previously, discrimination based on the partial filter
expression was done for all paths included in the wildcard
projection. This could lead to a situation where two queries
were erroneously assigned the same plan cache key.

The fix is to ensure that for wildcard indexes, partial
index discriminators are instead registered only for those
paths mentioned in the partial filter expression. Unlike
other kinds of wildcard index discriminators (e.g. handling
concerns of null equality or collation), the paths in the
partial filter expression are known a priori. Therefore,
discrimination based on the partial filter can be done in
the same way for wildcard and non-wildcard indexes.

(cherry picked from commit 80f424c02df47469792917673ab7e6dd77b01421)
Branch: v4.4
https://github.com/mongodb/mongo/commit/57edf434219c9659354f01fb6bf4f4e6c0370bc4

Comment by Githook User [ 15/Jun/20 ]

Author:

{'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}

Message: SERVER-48614 Fix plan cache discriminators for partial wildcard indexes

Previously, discrimination based on the partial filter
expression was done for all paths included in the wildcard
projection. This could lead to a situation where two queries
were erroneously assigned the same plan cache key.

The fix is to ensure that for wildcard indexes, partial
index discriminators are instead registered only for those
paths mentioned in the partial filter expression. Unlike
other kinds of wildcard index discriminators (e.g. handling
concerns of null equality or collation), the paths in the
partial filter expression are known a priori. Therefore,
discrimination based on the partial filter can be done in
the same way for wildcard and non-wildcard indexes.
Branch: master
https://github.com/mongodb/mongo/commit/80f424c02df47469792917673ab7e6dd77b01421

Comment by David Storch [ 10/Jun/20 ]

ralf.strobel one other note: I filed related ticket SERVER-48700 which you might be interested in. This tracks some ideas for how to increase our test coverage for the plan cache in order to help avoid future plan cache-related query correctness issues such as this one. Feel free to follow and/or vote for SERVER-48700 if you are interested.

Also, the code to fix this bug is now being reviewed internally by the Server Query Team. Stay tuned!

Comment by David Storch [ 10/Jun/20 ]

ralf.strobel I have a fix in progress that should be suitable for backport to the 4.4 and 4.2 branches. I'm checking with our release team regarding timing of the 4.2.x release into which the fix is likely to be included.

Comment by Ralf Strobel [ 10/Jun/20 ]

Thank you for the quick and thorough analysis! Can you already estimate how long a fix in 4.2.x might be out? Because wildcard indexes with partialFilterExpressions are currently in place essentially in all of our collections in all of our customers. Given that we only had one bug report so far, it probably only affects rare edge cases, but it still doesn't feel great to know there might be more cases like that lingering somewhere.

Comment by David Storch [ 09/Jun/20 ]

Here's an even simpler version of the repro. Notably, this repro shows that the problem can occur when the only secondary index in the collection is a wildcard index with a partialFilterExpression:

const coll = db.c;
 
coll.drop();
assert.commandWorked(coll.createIndex({"$**": 1}, {
    partialFilterExpression: {value1: 1},
}));
 
assert.commandWorked(coll.insert({value1: 1}));
 
const query1 = {
    value1: 1,
    value2: 1,
};  // does not match the document
const query2 = {
    value1: 1,
    value2: null,
};  // matches the document
 
// Create an active plan cache entry by running 'query1' twice.
assert.eq(0, coll.find(query1).itcount());
assert.eq(0, coll.find(query1).itcount());
 
// This query will incorrectly use the cached plan.
assert.eq(1, coll.find(query2).itcount());

Comment by David Storch [ 09/Jun/20 ]

As suspected, this is a bug with the plan cache indexability discriminators. I added some logging which dumps the complete value of the plan cache key, including both the "shape string" and indexability discriminator components. For both query1 and query2 from the repro above, the plan cache key looks like this:

MongoDFixture:job0] {"t":{"$date":"2020-06-09T17:21:12.441-04:00"},"s":"I",  "c":"QUERY",    "id":4861400, "ctx":"conn5","msg":"[storchprint] full plan cache key","attr":{"shapeStr":"an[eqvalue1,eqvalue2]","indexabilityStr":"<11><0>"}}

The indexability string has two sections of discriminator bits, one for each predicate. The "<11>" component indicates that the value1 predicate can be assigned to either index, {value1: 1} or the wildcard index. The "<0>" applies to the subsequent predicate on value2. Since there is no non-wildcard index created explicitly over the path value2, there is just one discriminator bit for the wildcard index. For both queries, the "<0>" indicates that the value2 predicate cannot be answered by the wildcard index. In the $eq:null case, this is due to the implicit sparseness property of wildcard indexes. In the case of the query with {value2: {$eq: 1}}, the situation is more subtle. Here, the "<0>" discriminator is due to the partialFilterExpression, since the documents that match {value2: {$eq: 1}} are not provably a subset of the partialFilterExpression.

This leads to a scenario where the two queries share a plan cache key, even though they have different indexability properties. Specifically, query1 cannot use the wildcard index to answer the predicate on value2, but query2 can use the wildcard index to answer the predicate on value2.

Comment by David Storch [ 09/Jun/20 ]

I've created a slightly simpler version of the attached repro script:

const collection = db.c;
 
collection.drop();
collection.createIndex({value1: 1});
collection.createIndex({"$**": 1}, {
    // The partial filter expression seems to be necessary for the problem to occur.
    // Note that all queries contain a corresponding filter, or the database would never consider
    // this index.
    partialFilterExpression: {value1: 1},
});
 
collection.insert({value1: 1});
 
// Note that both queries could use the wildcard index to match "value2",
// but only query2 should be able to match "value3" since the expected value is non-null.
// Both queries could also use dedicated "value1" index, but only query1 decides to do so.
const query1 = {
    value1: 1,
    value2: null,
};  // matches the document
const query2 = {
    value1: 1,
    value2: 1,
};  // does not match the document
 
// Execute query1 for the first time, it finds the document (via the dedicated "value1" index).
// filter: { value1: 1.0, value2: 1.0, value3: null, deleted: null }
// planSummary: IXSCAN { value1: 1 } keysExamined:1 docsExamined:1 fromMultiPlanner:1
// cursorExhausted:1 numYields:0 nreturned:1 queryHash:DE2DA7E4 planCacheKey:F2E93F9D
assert.eq(1, collection.find(query1).itcount());
 
// Then execute query2, which correctly does not find the document (using the wildcard index to
// match non-null "value3"). filter: { value1: 1.0, value2: 1.0, value3: 1.0, deleted: null }
// planSummary: IXSCAN { $_path: 1, value3: 1 } keysExamined:0 docsExamined:0 fromMultiPlanner:1
// cursorExhausted:1 numYields:0 nreturned:0 queryHash:DE2DA7E4 planCacheKey:F2E93F9D
assert.eq(0, collection.find(query2).itcount());
 
// Now execute query1 again. Suddenly it also tries to match "value3" via the wildcard index.
// But since we are looking for null, it cannot find an entry in the sparse index. filter: {
// value1: 1.0, value2: 1.0, value3: null, deleted: null } planSummary: IXSCAN { $_path: 1,
// value3: 1 } keysExamined:0 docsExamined:0 cursorExhausted:1 numYields:0 nreturned:0
// queryHash:DE2DA7E4 planCacheKey:F2E93F9D
assert.eq(1, collection.find(query1).itcount());

This looks likely to be a bug with the plan caching system.

Comment by Kelsey Schubert [ 08/Jun/20 ]

Thanks for the detailed report and clear repro script. We can reproduce and are investigating.

Generated at Thu Feb 08 05:17:37 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.