[SERVER-38949] Incorrect index bounds for {$ne: ["String"]} query Created: 11/Jan/19  Updated: 29/Oct/23  Resolved: 14/Feb/19

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: None
Fix Version/s: 4.0.9, 4.1.9

Type: Bug Priority: Critical - P2
Reporter: Charlie Swanson Assignee: Ian Boros
Resolution: Fixed Votes: 0
Labels: afz
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Depends
Problem/Incident
causes SERVER-40691 $nin:[[],...] queries are not indexed Closed
Related
related to SERVER-39764 Negation of $in with embedded array c... Closed
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v4.0, v3.6, v3.4, v3.2
Steps To Reproduce:

Saving this file as repro.js:

(function() { 
  "use strict"; 
 
 
  const coll = db.reproducer; 
  coll.drop(); 
 
 
  assert.commandWorked(coll.insert([{_id: 0, str: "Avon"}])); 
 
 
  assert.eq(1, coll.find({str: {$ne: ["Avon"]}}).itcount()); 
  assert.commandWorked(coll.ensureIndex({str: 1})); 
  assert.eq(1, coll.find({str: {$ne: ["Avon"]}}).itcount()); // FAILS 
})();

You can reproduce this failure with

python buildscripts/resmoke.py repro.js

Sprint: Query 2019-02-25
Participants:
Case:
Linked BF Score: 24

 Description   

For a query like

{x: {$not: {$in: [["String"]]}}

a collection scan will return the document

{x: "String"}

because "String" != ["String"] and so "String" is not in the array [["String"]]. However, if you build an index on {x: 1} then the above document is not returned. Using explain, you can see the index bounds are:

{
   "x" : [
   	"[MinKey, \"String\")",
   	"(\"String\", [ \"String\" ])",
   	"([ \"String\" ], MaxKey]"
   ]
}

Compare this to what should be the equivalent query

{x: {$ne: ["String"]}}

we then get the bounds:

{
	"x" : [
		"[MinKey, [ \"String\" ])",
		"([ \"String\" ], [ [ \"String\" ] ])",
		"([ [ \"String\" ] ], MaxKey]"
	]
}

It looks like the index bounds building logic for an $in within a $not.



 Comments   
Comment by Tom Scott [ 17/Apr/19 ]

Done! https://jira.mongodb.org/browse/SERVER-40691

Comment by Eric Sedor [ 16/Apr/19 ]

tubbo in this case can you please open a new ticket to describe the issue you're experiencing? Thanks in advance!

Comment by Tom Scott [ 16/Apr/19 ]

We had some issues with this when updating to MongoDB 4.0.9 (actually, our CI updated to 4.0.9 and locally we were still using 4.0.6, so it took a minute to reproduce the actual error) when using the notablescan configuration setting (which is turned on for all our tests and in production). We received an error when trying to execute a $nin query that had an index, because apparently these types of queries are no longer indexable?

If that's the case, might I suggest making $nin queries exempt from the error generated by notablescan?

Not sure if I should make a new issue or if it's OK to comment here?

Comment by Githook User [ 03/Apr/19 ]

Author:

{'email': 'puppyofkosh@gmail.com', 'name': 'Ian Boros', 'username': 'puppyofkosh'}

Message: SERVER-38949 Ban index usage for {$ne: <array>} queries
Branch: v4.0
https://github.com/mongodb/mongo/commit/465f94a31cdb34a1b92673c8fcbdcf642fcde031

Comment by Githook User [ 14/Feb/19 ]

Author:

{'name': 'Ian Boros', 'email': 'puppyofkosh@gmail.com', 'username': 'puppyofkosh'}

Message: SERVER-38949 ban index usage for {$ne: <array>} queries
Branch: master
https://github.com/mongodb/mongo/commit/34a1ce6a681e2637d3c29a49a9412efe63821178

Comment by James Wahlin [ 18/Jan/19 ]

We currently consider indexes as incompatible for negation for operators like $mod and $type in QueryPlannerIXSelect::_compatible. A quick fix for this issue would be to add a clause for negation of equality to array.

We discussed alternatives to this however which include:

  • Rather than generating bounds for the equality case and then using the complement, we could build bounds specifically for the "not equal to array value" case.
  • We could ban index use for the more general case, which is the negation for any predicate that we would generate INEXACT_FETCH bounds for. If we went this route, we would likely need to special case "not exists" as we do today.

In addition to the above, charlie.swanson suggested we could explore adding an invariant that we have exact bounds, at the point where we complement bounds for the not case in the index bounds builder to help catch other cases where we may be generating incorrect bounds. Along with this we could consider performing an audit to see if there are other cases we suspect are not handled correctly.

Comment by Charlie Swanson [ 11/Jan/19 ]

Ok I've narrowed this down a bit and I think I've found the problem. If we're dealing with a NOT inside of the index bounds builder, then we first build bounds for the inner predicate and then invert them. This usually works fine for when the inner predicate is an equality predicate, but doesn't work when that equality predicate is comparing to an array. This is because when comparing to array, we cannot build exact bounds for the query: see the bounds building logic here.

Because the bounds for the equality are INEXACT_FETCH, this means that there will be some values in those index bounds which do not match the equality predicate. Such values will be filtered out by the FETCH. If we invert those bounds then, the index scan will never even see the values which would have been filtered out by the FETCH. It's incorrect to skip these values on inversion for NOT, since they should match the NOT.

This very simple and hacky patch fixes the reproducer:

diff --git a/src/mongo/db/query/indexability.h b/src/mongo/db/query/indexability.h
index ce2235a..6de63ea 100644
--- a/src/mongo/db/query/indexability.h
+++ b/src/mongo/db/query/indexability.h
@@ -29,6 +29,7 @@
  */
 
 #include "mongo/db/matcher/expression.h"
+#include "mongo/db/matcher/expression_leaf.h"
 
 #pragma once
 
@@ -122,7 +123,11 @@ public:
      */
     static bool isBoundsGeneratingNot(const MatchExpression* me) {
         return MatchExpression::NOT == me->matchType() &&
-            nodeCanUseIndexOnOwnField(me->getChild(0));
+            nodeCanUseIndexOnOwnField(me->getChild(0)) &&
+            !(me->getChild(0)->matchType() == MatchExpression::EQ &&
+              static_cast<const ComparisonMatchExpressionBase*>(me->getChild(0))
+                      ->getData()
+                      .type() == BSONType::Array);
     }
 
     /**

Generated at Thu Feb 08 04:50:33 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.