[SERVER-48614] Plan cache key computation for wildcard indexes with partialIndexFilter is incorrect, leading to incorrect query results Created: 05/Jun/20 Updated: 29/Oct/23 Resolved: 15/Jun/20 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | Index Maintenance, Querying |
| Affects Version/s: | 4.2.7 |
| Fix Version/s: | 4.4.0-rc10, 4.2.9, 4.7.0 |
| Type: | Bug | Priority: | Critical - P2 |
| Reporter: | Ralf Strobel | Assignee: | David Storch |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||
| Operating System: | ALL | ||||||||||||
| Backport Requested: |
v4.4, v4.2
|
||||||||||||
| Steps To Reproduce: | Run attached reproduction script. Note that the script loops over the relevant steps until the error occurs, since the problem does not seem to be entirely deterministic but may have some sort of timing sensitivity. Also note the rather specific setup steps. We could only reproduce the issue if the wildcard index is defined with a partialFilterExpression and when a second non-wildcard index is present. But we are not sure if either of these aspects are causally relevant. |
||||||||||||
| Sprint: | Query 2020-06-15, Query 2020-06-29 | ||||||||||||
| Participants: | |||||||||||||
| Description |
|
We had an issue where a query on a collection containing a wildcard index seemingly randomly returned zero results, even though matching objects existed. The issue first arose one days after introducing wildcard indexes, so a causal relation seemed likely. The problem seems to arise if a first query successfully filters for a non-null value using the wildcard index...
And then a second nearly identical query filters for null on the same field...
Seemingly, the query planner simply reuses the plan from the first query, not realizing that the wildcard index is sparse by definition. |
| Comments |
| Comment by Ralf Strobel [ 17/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
david.storch Thank you very much! I guess that will be mid July then, since 4.2.8 has just been released. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 17/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
ralf.strobel I've backported the fix to the 4.4 and 4.2 branches. It will be included as part of the forthcoming 4.2.9 release. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: Previously, discrimination based on the partial filter The fix is to ensure that for wildcard indexes, partial (cherry picked from commit 57edf434219c9659354f01fb6bf4f4e6c0370bc4) | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 16/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: Previously, discrimination based on the partial filter The fix is to ensure that for wildcard indexes, partial (cherry picked from commit 80f424c02df47469792917673ab7e6dd77b01421) | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Githook User [ 15/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Author: {'name': 'David Storch', 'email': 'david.storch@mongodb.com', 'username': 'dstorch'}Message: Previously, discrimination based on the partial filter The fix is to ensure that for wildcard indexes, partial | |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 10/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
ralf.strobel one other note: I filed related ticket SERVER-48700 which you might be interested in. This tracks some ideas for how to increase our test coverage for the plan cache in order to help avoid future plan cache-related query correctness issues such as this one. Feel free to follow and/or vote for SERVER-48700 if you are interested. Also, the code to fix this bug is now being reviewed internally by the Server Query Team. Stay tuned! | |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 10/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
ralf.strobel I have a fix in progress that should be suitable for backport to the 4.4 and 4.2 branches. I'm checking with our release team regarding timing of the 4.2.x release into which the fix is likely to be included. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Ralf Strobel [ 10/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Thank you for the quick and thorough analysis! Can you already estimate how long a fix in 4.2.x might be out? Because wildcard indexes with partialFilterExpressions are currently in place essentially in all of our collections in all of our customers. Given that we only had one bug report so far, it probably only affects rare edge cases, but it still doesn't feel great to know there might be more cases like that lingering somewhere. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 09/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Here's an even simpler version of the repro. Notably, this repro shows that the problem can occur when the only secondary index in the collection is a wildcard index with a partialFilterExpression:
| |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 09/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
As suspected, this is a bug with the plan cache indexability discriminators. I added some logging which dumps the complete value of the plan cache key, including both the "shape string" and indexability discriminator components. For both query1 and query2 from the repro above, the plan cache key looks like this:
The indexability string has two sections of discriminator bits, one for each predicate. The "<11>" component indicates that the value1 predicate can be assigned to either index, {value1: 1} or the wildcard index. The "<0>" applies to the subsequent predicate on value2. Since there is no non-wildcard index created explicitly over the path value2, there is just one discriminator bit for the wildcard index. For both queries, the "<0>" indicates that the value2 predicate cannot be answered by the wildcard index. In the $eq:null case, this is due to the implicit sparseness property of wildcard indexes. In the case of the query with {value2: {$eq: 1}}, the situation is more subtle. Here, the "<0>" discriminator is due to the partialFilterExpression, since the documents that match {value2: {$eq: 1}} are not provably a subset of the partialFilterExpression. This leads to a scenario where the two queries share a plan cache key, even though they have different indexability properties. Specifically, query1 cannot use the wildcard index to answer the predicate on value2, but query2 can use the wildcard index to answer the predicate on value2. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by David Storch [ 09/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
I've created a slightly simpler version of the attached repro script:
This looks likely to be a bug with the plan caching system. | |||||||||||||||||||||||||||||||||||||||||||
| Comment by Kelsey Schubert [ 08/Jun/20 ] | |||||||||||||||||||||||||||||||||||||||||||
|
Thanks for the detailed report and clear repro script. We can reproduce and are investigating. |