[SERVER-21011] Certain queries against compound 2d/text indexes are incorrectly covered, return incorrect results Created: 19/Oct/15  Updated: 06/Dec/17  Resolved: 13/Oct/17

Status: Closed
Project: Core Server
Component/s: Querying
Affects Version/s: 3.0.6
Fix Version/s: 3.4.11, 3.6.0-rc1

Type: Bug Priority: Major - P3
Reporter: Travis Redman Assignee: David Storch
Resolution: Done Votes: 1
Labels: RF, bkp
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Issue Links:
Backports
Duplicate
is duplicated by SERVER-31444 Queries against multikey trailing fie... Closed
Related
related to SERVER-31530 $elemMatch over trailing field of a 2... Closed
is related to SERVER-23909 Allow "2d" indices to accept a collation Backlog
Backwards Compatibility: Fully Compatible
Operating System: ALL
Backport Requested:
v3.4, v3.2
Steps To Reproduce:

> db.foo.drop()
true
> db.foo.insert({a: 1, "location" : [ -117.15, 32.88 ]})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({a: 2, "location" : [ -117.15, 32.88 ]})
WriteResult({ "nInserted" : 1 })
> db.foo.insert({a: 3, "location" : [ -117.15, 32.88 ]})
WriteResult({ "nInserted" : 1 })
> db.foo.ensureIndex({location: "2d"})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 1,
        "numIndexesAfter" : 2,
        "ok" : 1
}

In the query below, we request documents where b exists. As expected, we get 0 results.

> db.foo.find({location: { $nearSphere: [ -117.2684194, 33.1029176 ], $maxDistance: 0.02526017985248055 }, b: { $exists: true}})
# no results (expected)

When an index is added on (location, b), all documents are returned when $exists: true is set for b, which is incorrect.

> db.foo.ensureIndex({location: "2d", b: 1})
{
        "createdCollectionAutomatically" : false,
        "numIndexesBefore" : 2,
        "numIndexesAfter" : 3,
        "ok" : 1
}
> db.foo.find({location: { $nearSphere: [ -117.2684194, 33.1029176 ], $maxDistance: 0.02526017985248055 }, b: { $exists: true}}).hint("location_2d_b_1")
{ "_id" : ObjectId("562524f65a47d987934ec69b"), "a" : 1, "location" : [ -117.15, 32.88 ] }
{ "_id" : ObjectId("562524fb5a47d987934ec69c"), "a" : 2, "location" : [ -117.15, 32.88 ] }
{ "_id" : ObjectId("562525005a47d987934ec69d"), "a" : 3, "location" : [ -117.15, 32.88 ] }

Sprint: QuInt C (11/23/15), Query 2017-10-23
Participants:

 Description   

In Mongo 3.0.6, compound 2d indexes do not filter results correctly when an $exists operator is used on the second key. This can cause incorrect data to be returned depending on which index is selected.



 Comments   
Comment by Githook User [ 27/Oct/17 ]

Author:

{'email': 'david.storch@10gen.com', 'name': 'David Storch', 'username': 'dstorch'}

Message: SERVER-21011 Fix query correctness problem related to covered matching for 2d/text indexes.

The fix ensures that the tightness predicates over the
trailing fields of 2d/text indexes is checked. Predicates
which are INEXACT_FETCH will then get affixed to the FETCH
stage of the plan rather than incorrectly affixed to the
IXSCAN.

(cherry picked from commit 744738bd23a5aed625dc1eed89851824fcf5e33a)

Conflicts:
src/mongo/db/query/index_bounds_builder_test.cpp
src/mongo/db/query/query_planner_geo_test.cpp
Branch: v3.4
https://github.com/mongodb/mongo/commit/c0ba9b66e8d00e655939e7857181dbc648316fc5

Comment by David Storch [ 13/Oct/17 ]

After further investigation, it looks like the fix for this ticket also handles the problem described by SERVER-31444. Therefore, I am going to resolve SERVER-31444 as a duplicate of this ticket.

Comment by Githook User [ 13/Oct/17 ]

Author:

{'email': 'david.storch@10gen.com', 'name': 'David Storch', 'username': 'dstorch'}

Message: SERVER-21011 Fix query correctness problem related to covered matching for 2d/text indexes.

The fix ensures that the tightness predicates over the
trailing fields of 2d/text indexes is checked. Predicates
which are INEXACT_FETCH will then get affixed to the FETCH
stage of the plan rather than incorrectly affixed to the
IXSCAN.
Branch: master
https://github.com/mongodb/mongo/commit/744738bd23a5aed625dc1eed89851824fcf5e33a

Comment by David Storch [ 06/Oct/17 ]

After investigating this further, I have concluded that there are actually two distinct issues at play here. The originally reported issue here has to do with existence queries like {$exists: true}, {$eq: null}, and so on. This is the direct result of our unconditional use of INEXACT_COVERED tightness for predicates assigned to the trailing fields of a "2d" or "text" index:

https://github.com/mongodb/mongo/blob/33990519ca30e8a653aaca218c49539f5eba3468/src/mongo/db/query/planner_access.cpp#L341-L346
https://github.com/mongodb/mongo/blob/33990519ca30e8a653aaca218c49539f5eba3468/src/mongo/db/query/planner_access.cpp#L363-L368

We can fix this by simply changing the blocks of code above to instead use INEXACT_FETCH tightness for existence predicates that can never be covered.

The other issue only affects compound and multikey 2d indexes and is described in related ticket SERVER-31444. It is more insidious and may require a more complex fix with upgrade/downgrade or index versioning implications.

Comment by J Rassi [ 25/Apr/16 ]

One way to fix this issue would be to stop assigning predicates to trailing fields of 2d/text indexes if any document has been indexed with an array value along this path (we likely would be able to use the path-level multikey tracking infrastructure for this purpose). If no such document has been indexed, we would then be able to generate tight bounds for these predicates.

Comment by J Rassi [ 19/Nov/15 ]

The root cause of this issue is a bug in the covering logic for compound 2d and compound text indexes.

The index format for 2d and text indexes is unusual, in that arrays (on non-2d/non-text fields) are stored verbatim in the index key, instead of being exploded into separate index keys. To illustrate:

> db.foo.drop()
true
> db.foo.ensureIndex({a: "2d", b: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.insert({a: [0, 0], b: ['foo', 'bar', 'baz']})
WriteResult({ "nInserted" : 1 })
> db.foo.validate().keysPerIndex
{ "test.foo.$_id_" : 1, "test.foo.$a_2d_b_1" : 1 } // "2d" index has one key, not three.

As a result, predicates on these fields cannot generate bounds (for example, if we tried to generate bounds for the predicate {b: 'foo'}, we would get bounds of (b: ['foo', 'foo']), which would miss a document with a "b" value of ['foo', 'bar', 'baz']}).

Instead of generating bounds for these predicates, the access planner always treats them as having INEXACT_COVERED tightness, and attaches them as a filter to the appropriate index access stage. This is simply incorrect for non-coverable predicates (INEXACT_FETCH), like {$exists: true}, {$eq: null}, etc. The correct behavior would be for the access planner to never attempt to cover predicates in this way that would generate bounds with INEXACT_FETCH tightness.

Simple reproducer with $near predicate, compound 2d index:

> db.foo.ensureIndex({a: "2d", b: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.insert({a: [0, 0]})
WriteResult({ "nInserted" : 1 })
> db.foo.find({a: {$near: [0, 0]}, b: {$exists: true}})
{ "_id" : ObjectId("564e423803dc4436ff448b5e"), "a" : [ 0, 0 ] }  // Should not be returned.

Simple reproducer with non-$near predicate, compound 2d index:

> db.foo.ensureIndex({a: "2d", b: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.insert({a: [0, 0]})
WriteResult({ "nInserted" : 1 })
> db.foo.find({a: {$geoWithin: {$center: [[0, 0], 1]}}, b: {$exists: true}})
{ "_id" : ObjectId("564e417603dc4436ff448b5b"), "a" : [ 0, 0 ] }  // Should not be returned.

Simple reproducer with $text predicate, compound text index:

> db.foo.ensureIndex({a: "text", b: 1})
{
	"createdCollectionAutomatically" : true,
	"numIndexesBefore" : 1,
	"numIndexesAfter" : 2,
	"ok" : 1
}
> db.foo.insert({a: "hello world"})
WriteResult({ "nInserted" : 1 })
> db.foo.find({$text: {$search: "hello"}, b: {$exists: true}})
{ "_id" : ObjectId("564e41b703dc4436ff448b5c"), "a" : "hello world" }  // Should not be returned.

Comment by Ramon Fernandez Marina [ 19/Oct/15 ]

tredman@fb.com, I'm able to reproduce the behavior you describe and we're investigating.

Generated at Thu Feb 08 03:55:58 UTC 2024 using Jira 9.7.1#970001-sha1:2222b88b221c4928ef0de3161136cc90c8356a66.