[SERVER-56468] Incorrect plan cache entry for {$ne: null} predicate, leads to missing query results Created: 29/Apr/21  Updated: 29/Oct/23  Resolved: 17/May/21

Status: Closed
Project: Core Server
Component/s: None
Affects Version/s: None
Fix Version/s: 4.2.15, 4.4.7, 5.0.0-rc0

Type: Bug Priority: Major - P3
Reporter: Bernard Gorman Assignee: Andrii Dobroshynski (Inactive)
Resolution: Fixed Votes: 0
Labels: post-rc0, sbe-post-rc0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Text File prettyPrintPlansWhenFail.log     Text File prettyPrintPlansWhenPass.log    
Issue Links:
Backports
Depends
Related
is related to SERVER-49766 Indexed and non-indexed collections r... Closed
is related to SERVER-54421 [SBE] Fix tests failing to match a do... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.4, v4.2, v4.0
Sprint: Query Execution 2021-05-03, Query Execution 2021-05-17, Query Execution 2021-05-31
Participants:
Linked BF Score: 131

 Comments   
Comment by Githook User [ 02/Jun/21 ]

Author:

{'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}

Message: SERVER-56468 Fix bug causing incorrect plan cache entry for {$ne: null} predicate, leading to missing query results

(cherry picked from commit 20019cf4ac5c1159e27cc458ad272146c11f139d)
Branch: v4.2
https://github.com/mongodb/mongo/commit/466ace369b8bb25a7e643f4c9d1f05f465f1bb09

Comment by Githook User [ 28/May/21 ]

Author:

{'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}

Message: SERVER-56468 Fix bug causing incorrect plan cache entry for {$ne: null} predicate, leading to missing query results

(cherry picked from commit 20019cf4ac5c1159e27cc458ad272146c11f139d)
Branch: v4.4
https://github.com/mongodb/mongo/commit/0902fd0ab97ea2f9af74e1d07038294eab30fe3a

Comment by Githook User [ 17/May/21 ]

Author:

{'name': 'Andrii Dobroshynski', 'email': 'andrii.dobroshynski@mongodb.com', 'username': 'dobroshynski'}

Message: SERVER-56468 Fix bug causing incorrect plan cache entry for {$ne: null} predicate, leading to missing query results
Branch: master
https://github.com/mongodb/mongo/commit/20019cf4ac5c1159e27cc458ad272146c11f139d

Comment by David Storch [ 05/May/21 ]

andrii.dobroshynski and I paired on this some more and we (mostly) figured it out! It's a plan cache bug. In some unusual conditions, a plan cache entry can be created for a query of the shape {val: {$not: {$eq: ?}}}. This plan cache entry is correct if the parameter (represented by the question mark) is some scalar, like a number or a boolean. But it is not correct if the parameter is the literal null, because not-equal-to-null predicates generally cannot use a multikey index. However, there is no special provision to ensure that not-equal-to-null queries have a different plan cache key from not-equal-to-constant. In other words, {val: {$not: {$eq: true}}} and {val: {$not: {$eq: null}}} have the same plan cache key. We used this discovery to design the following repro script:

(function() {
"use strict";
 
const coll = db.not_equals_null;
coll.drop();
 
assert.commandWorked(coll.createIndex({val: 1}));
assert.commandWorked(coll.insert({val: []}));
assert.commandWorked(coll.insert({val: true}));
assert.commandWorked(coll.insert({val: true}));
 
// This query should match all three documents.
assert.eq(3, coll.find({val: {$not: {$eq: null}}}, {_id: 0, val: 1}).sort({val: 1}).itcount());
 
// Run this query twice to create an active plan cache entry.
for (let i = 0; i < 2; ++i) {
    assert.eq(1, coll.find({val: {$not: {$eq: true}}}, {_id: 0, val: 1}).sort({val: 1}).itcount());
}
 
// This query ends up using the cache entry created above, which is incorrect. As a result we fail
// to return the document {val: []}.
assert.eq(3, coll.find({val: {$not: {$eq: null}}}, {_id: 0, val: 1}).sort({val: 1}).itcount());
}());

The final assertion in this repro script fails. For this repro to work, you must run it with SBE on! For example, you should run it like so:

python3 buildscripts/resmoke.py run --suites=core --additionalFeatureFlags=featureFlagSBE repro.js

The bug is not in SBE, but it appears to have been exposed by enabling SBE. We don't understand exactly why yet. A likely answer to this mystery is that plan ranking behavior is different when SBE is enabled, which results in the problematic plan being cached only when SBE is on. We also don't understand why we've only seen this manifest in particular passthrough suites. Perhaps Andrii can dig deeper into these questions.

As a final note, it appears that we fixed a similar bug in SERVER-49766. However, this previous fix only handled {$not: {$gte: null}} and {$not: {$lte: null}}. See this function. A likely fix is to extend that function to handle {$not: {$eq: null}} as well.

kyle.suarez given the above, I am going to remove this from the SBE epic; I don't think it blocks turning SBE on by default unless it causes too much noise in the build.

Comment by David Storch [ 05/May/21 ]

I suspect that we are getting different plans because the QueryPlanner is given different multikeyness metadata for the {val: 1} in the failure case as opposed to the success case. The next step should be to add logging to print out the multikeyness metadata for the problematic query in the case of the test passing as well as the case of the test failing. Assuming that the index is incorrectly marked as "not multikey" in the failure case, then we will have to start investigating why the multikeyness metadata appears to be wrong.

Generated at Thu Feb 08 05:39:20 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.