[SERVER-56468] Incorrect plan cache entry for {$ne: null} predicate, leads to missing query results Created: 29/Apr/21 Updated: 29/Oct/23 Resolved: 17/May/21 |
|
| Status: | Closed |
| Project: | Core Server |
| Component/s: | None |
| Affects Version/s: | None |
| Fix Version/s: | 4.2.15, 4.4.7, 5.0.0-rc0 |
| Type: | Bug | Priority: | Major - P3 |
| Reporter: | Bernard Gorman | Assignee: | Andrii Dobroshynski (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | post-rc0, sbe-post-rc0 | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||
| Backwards Compatibility: | Fully Compatible | ||||||||||||||||||||
| Operating System: | ALL | ||||||||||||||||||||
| Backport Requested: |
v4.4, v4.2, v4.0
|
||||||||||||||||||||
| Sprint: | Query Execution 2021-05-03, Query Execution 2021-05-17, Query Execution 2021-05-31 | ||||||||||||||||||||
| Participants: | |||||||||||||||||||||
| Linked BF Score: | 131 | ||||||||||||||||||||
| Comments |
| Comment by Githook User [ 02/Jun/21 ] | ||||||||||||||||||||||||
|
Author: {'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}Message: (cherry picked from commit 20019cf4ac5c1159e27cc458ad272146c11f139d) | ||||||||||||||||||||||||
| Comment by Githook User [ 28/May/21 ] | ||||||||||||||||||||||||
|
Author: {'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}Message: (cherry picked from commit 20019cf4ac5c1159e27cc458ad272146c11f139d) | ||||||||||||||||||||||||
| Comment by Githook User [ 17/May/21 ] | ||||||||||||||||||||||||
|
Author: {'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}Message: | ||||||||||||||||||||||||
| Comment by David Storch [ 05/May/21 ] | ||||||||||||||||||||||||
|
andrii.dobroshynski and I paired on this some more and we (mostly) figured it out! It's a plan cache bug. In some unusual conditions, a plan cache entry can be created for a query of the shape {val: {$not: {$eq: ?}}}. This plan cache entry is correct if the parameter (represented by the question mark) is some scalar, like a number or a boolean. But it is not correct if the parameter is the literal null, because not-equal-to-null predicates generally cannot use a multikey index. However, there is no special provision to ensure that not-equal-to-null queries have a different plan cache key from not-equal-to-constant. In other words, {val: {$not: {$eq: true}}} and {val: {$not: {$eq: null}}} have the same plan cache key. We used this discovery to design the following repro script:
The final assertion in this repro script fails. For this repro to work, you must run it with SBE on! For example, you should run it like so:
The bug is not in SBE, but it appears to have been exposed by enabling SBE. We don't understand exactly why yet. A likely answer to this mystery is that plan ranking behavior is different when SBE is enabled, which results in the problematic plan being cached only when SBE is on. We also don't understand why we've only seen this manifest in particular passthrough suites. Perhaps Andrii can dig deeper into these questions. As a final note, it appears that we fixed a similar bug in kyle.suarez given the above, I am going to remove this from the SBE epic; I don't think it blocks turning SBE on by default unless it causes too much noise in the build. | ||||||||||||||||||||||||
| Comment by David Storch [ 05/May/21 ] | ||||||||||||||||||||||||
|
I suspect that we are getting different plans because the QueryPlanner is given different multikeyness metadata for the {val: 1} in the failure case as opposed to the success case. The next step should be to add logging to print out the multikeyness metadata for the problematic query in the case of the test passing as well as the case of the test failing. Assuming that the index is incorrectly marked as "not multikey" in the failure case, then we will have to start investigating why the multikeyness metadata appears to be wrong. |